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Subtract ive buried data is a new proposal for conveying a high -data -rate data channel 
(with up to 350 kbit/s or more) compatibly within the data stream of an audio CD without 
significant impairment of existing CD performance. This proposal uses pseudorandomized 
data as noise-shaped subtract! ve dither for the conventional audio. The new data channel 
may be used for high-quality data-reduced related audio channels, or even for daia- 
compressed video or computer data, while retaining compatibility with existing audio 
CD players. 



0 INTRODUCTION 

This paper describes a new proposal for burying a 
high-data-rate data channel compatibly within the data 
stream of an audio CD. The maximum rate that can be 
buried without significant impairment of existing CD 
performance is on the order of 220-350 kbit/s, or even 
more (over 500 kbit/s) if variable data rate techniques 
are used. The subtractive buried data proposal in this 
paper replaces a number of the least significant bits 
(LSBs) of the audio words '(typically up to four per 
channel) by other data and uses the psychoacoustic 
noise-shaping techniques associated with noise-shaped 
subtractive dither to reduce the audibility of the resulting 
added noise down to a subjective perceived level equal 
to that of conventional CD. 

Simply replacing the LSBs of existing audio data 
would, of course, cause a drastic audible modification 
of the existing audio signal for two reasons: 

1) The word length of existing signals would be trun- 
cated to. say, only 12 bit. which would not only reduce 
the basic quantization resolution by 24 dB but it would 
also introduce the problems of added distortion and mod- 
ulation noise caused by truncation (for example, see 
HI-HI). 



* Presented at the 94th Convention of the Audio Engineering 
Society. Berlin. Germany. 199) March 16-19; revised 1994 
November 2. 
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2) In addition the replaced last, say, 4 LSBs would 
themselves constitute an added noise signal, which itself 
may not have a perceptually desirable random-noise-like 
quality and will also add to the perceived noise level in 
the main audio signal, typically increasing the noise by 
a further 3 dB above that due to truncation alone, giving 
in this case as much as 27 -dB total degradation in 
noise performance. 

This paper describes the following methods of over- 
coming alt these problems in replacing the last few LSBs 
of an audio signal by other data. 

1) Using a pseudorandom encode -decode process, 
operating only on the LSB data stream itself without 
extra synchronizing signals, to make the added LSB data 
effectively of random noise form so thai the added signal 
becomes truly noise-like. 

2) Using this pseudorandom data signal as a sub- 
tractive dither signal (for example, see [I I- [4]). so that 
simultaneously it does not add to the perceived noise 
and it removes all nonlinear distortion and modulation 
noise effects caused by truncation. This step is the es- 
sence of the subtractive buried data process. Remark- 
ably, and unlike in the ordinary subtractive dither case 
[3], a special subtractive dither decoder is not needed, 
so that the process works on a standard off-the-shelf 
CD player. 

3) Furthermore, at the encoding stage, incorporating 
psychoacoustically optimized noise shaping of the (sub- 
tractive) truncation error, thereby reducing the perceived 

3 
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truncation noise error further by between 9 and 17 dB. 
depending on the psychoacoustic tradeoffs chosen. 

Subtracti ve buried data using methods ( ) and 2) can be 
applied with or without the noise shaping of method 3). 

The overall effect of combining these three processes 
is that if one incorporates data into the last few LSBs, 
then the effects of distortion, modulation noise, and per- 
ceived audible patterns in the LSB data are completely 
removed t and the resulting perceived steady noise is 
reduced by around 15-23 dB below that of ordinary 
unshaped optimally dithered quantization to the same 
number of bits. 

For example, using the most extreme noise-shaping 
strategy, when the last 4 LSBs of the 1 6-bit CD word 
length is used for buried channel data, the perceived 
signal-to-noise ratio is around 9E dB — approximately 
the same as ordinary 16-bit CD quality when unshaped 
dither is used. With a more moderate noise-shaping strat- 
egy, the last 2'/2 LSBs of the 16-bit CD word length can 
be used for buried channel data, without degrading the 
perceptual quality of the 16-bit CD medium. 

The result of this process is that as much as 2 x 4 
= 8 bit of data per stereo sample is available for buried 
data without significant loss of audio quality on CD. 
giving a data rate of 8 x 44.1 = 352.8 kbit/s. With 
more moderate noise-shaping strategies, 2 x 2 1 /: = 5 
bit of data per stereo sample is available for buried data 
without significant loss of audio quality on CD. giving 
a data rate of 5 x 44. 1 = 220.5 kbit/s. 

The Appendix provides a detailed discussion of the 
tradeoffs in data rate versus perceptual quality for differ- 
ent data rates of buried data and for various options in 
noise shaping and choice of preemphasis for the CD 
medium. The main body of the paper is concerned with 
the technical method of implementing the buried data 
channel. 

While the subtracti ve buried data process achieves 
potentially high data rates for the buried channel, it does 
of course reduce the room for improvements in CD audio 
quality, approaching 20-bit effective audio quality, as 
described in [3|, (4J. However, there is no reason why 
the process should only be used with one fixed number 
of LSBs. By reducing the data rate of the buried channel 
to a smaller number of LSBs. one correspondingly im- 
proves the resolution of the audio — for example, achiev- 
ing an effective perceived signal-to-noise ratio of around 
103 dB for a system using 2 LSBs of data per signal 
channel sample, with a data rate still of 176.4 kbit/s. 

One can even make the number of LSBs used frac- 
tional, say Vj. or 17; LSBs per sample. This may be 
used either to match the buried channel ro a desired data 
rate precisely, or to minimize the loss of audio qualitv. 
especially at very low data rates. 

In addition by including in the LSB data channel itself 
low-rate data indicating the number of LSBs "stolen" 
from the main audio channels, it is possible to varv the 
number of LSBs stolen in a time-variant wav, so thai, 
for example, more LSBs can be taken by the buried 
channel when the resulting error is masked bv a hiah- 
level main audio signal. The noise shaping can also" be 



varied adaptively at the encoding stage so that at high 
audio levels, the noise error is maximally masked by 
the audio signal. These ideas have been further explored 
by Oomen et al. (251,' who quote an average bit rate of 
500 kbit/s, increasing to nearly 800 kbit/s in loud 
passages. 

A variable-data-rate approach to transmitting data in 
an audio waveform, for use with the NIC AM svstem. 
has also been described by Emmett [26]. Here the' shape 
of the error spectrum is adaptively changed to be masked 
by the audio signal. This may or may not have some 
common features with the present proposal, as the details 
of Emmett's proposal are not clear from his published 
preprint. 

It is also shown in this paper that with stereo signals 
it is possible to code data jointly in the least significant 
parts of the audio words of the two (or more) channels, 
using a multichannel version of the data-encoding pro- 
cess, involving the use of vector quantizers and subtrac- 
tive vector dithering by a multichannel pseudorandom 
data signal for the dithering. The basic theory of vector 
dithering is described in Section 5, although readers may 
find it best to omit these technically difficult aspects on 
first reading. It is shown that the vector multichannel 
version of the data-coding process ensures left-right 
symmetry of any added noise in the audio reproduction 
and an advantageous noise performance. 

The approach described in this paper is substantially 
different from an alternative method of burying data 
described in [7], which involved a process of splitting 
the audio signal into subbands, replacing the LSBs of 
the subbands with data based on auditory masking the- 
ory, and then reassembling the resulting data by recom- 
bining the subbands. Not only is that process very com- 
plicated, with a considerable time-delay penalty in the 
subband encoding-decoding process, but it has to be 
done with extraordinary precision to prevent data errors 
in the band splitting and recombining process. By con- 
trast, the present process involves little time delay, in- 
volves relatively simple signal processing, and further 
is such as to guarantee the lack of audible side effects 
due to nonlinear distortion, modulation noise, or data- 
related audible patterns. 

1 USES OF BURIED DATA 

1.1 Advantages over CD ROM Media 

The availability of a buried data channel with data 
rates on the order or 350 kbit/s without significant loss 
of audio quality on audio CDs. fully compatible with 
conventional playback on standard audio plavers. opens 
up prospects for many new products. Unlike standards 
such as CD-I based on CD-ROM. the additional data can 
be added withour destroying compatibility with playback 
over tens of millions of existing audio players. This 

' Oomen el al. |5| only consider subiractive buried data 
channels .stealing an integer number of bit* from each of the 
siervn channel.*. J)' stereo parity buried Jau methyls are used. 
:is described in Sections 2.2 and ft. 3. the available Jau rate 
is typically turther increased by around 22 kbi:/\. 
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means that the new data channel can be added while still 
giving the CD the advantages of mass-market economies 
of scale of production, thanks to the existing audio-only 
market. Thus applications using the new data channel 
should result in much lower prices than for media where 
the number of players is limited. 

1.2 Application to Multichannel Sound 

One application of the new data channel is using the 
additional bits to add, by means of audio data compres- 
sion » additional audio channels for three- or more loud- 
speaker frontal stereo or surround sound, as described, 
for example, in [8]-[I0]. Because CDs have higher 
quality than available data-compression systems (despite 
claims of "transparency" or "CD quality" by some less 
cautious proponents of such systems) , care must be taken 
that the additional channels are not too compromised in 
quality by the data-compression process, which means 
that a rather lesser degree of compression is desirable 
than for DAB or film surround sound. However, since 
two of the transmitted audio channels are the standard 
CD audio channels and the design of the buried channel 
avoids nonlinear or modulation noise effects on these 
main channels, all the data rate in the buried channel 
can be used solely for the additional channels, giving 
each a higher data rate than if the buried channel were 
used to transmit the whole audio signal. In using the 
buried channel to transmit additional directional audio 
channels , it is important to design the codec error signals 
so that they do not become audible through the mecha- 
nism of directional unmasking described in three of one 
of the authors* references [I I]- [13]. 

The data rate available using the most extreme noise- 
shaping strategies is sufficient to transmit a Dolby AC-3 
or MUSIC AM surround five-channel surround-sound 
signal, but these systems involve a quality compromise 
with the data rate so thai this is not a preferred procedure. 
Such systems are preferably used in a manner such that 
the main two channels conveying a stereo-compatible 
mix are conveyed as standard CD audio, with only the 
three or more supplementary channels in data-com- 
pressed form. 

High-quality data-compressed additional audio chan- 
nels can. unlike existing data-compression systems, 
minimize the risk of destruction of subtle auditory cues 
such as those for perceived distance (see [141). thereby 
maintaining CD digital audio as the preferred medium 
for high-quality audio while adding additional channels. 
For high-quality (and especially musical) use it may be 
preferred to use additional buried audio channels either 
for frontal-stage three - or four- loudspeaker stereo or for 
three-channel horizontal or four-channel rull-sphere 
with-height (151. ['61 ambisonic surround sound (see 
f9|. 1 10|. 1 17|. rather than for the rather cruder theatrical 
"surround-sound" effects considered appropriate for cin- 
ema or video-related surround-sound systems. However, 
systems have been proposed for intercompatiblo use of 
both kinds of systems |9|. I 1 0 f . 

Since the main audio channels in this proposal convey 
high-quality audio, it is possible to use the spectral envc- 
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lope of the main audio channels to convey most or all 
of the dynamic ranging information used for the sub- 
bands in data-reduction systems for related subsidiary 
channels conveyed in the buried data channel , especially 
if the main audio channels incorporate a mixture of all 
the transmitted channels so that no direction is canceled 
out. This saves the data overhead of conveying ranging 
data, which in high-quality systems may save on the 
order of 60 kbit/s as compared to a stand-alone data- 
compression system. This will allow a system conveying 
n related channels using 4 LSB per main CD audio chan- 
nel to give a performance equivalent to that of a stand- 
alone data-compression system conveying n - 2 chan- 
nels in about 410 kbit/s. For three-channel systems, such 
as horizontal B-format surround-sound or three-channel 
UHJ [17] or frontal-stage three -channel stereo, this qual- 
ity is unlikely to be audibly distinguishable from an 
uncompressed data channel. For four-channel systems, 
the results will still subjectively approach thai of critical 
studio-quality material, and even for five-channel mate- 
rial, the results will be considerably less compromised 
than that for DAB or cinema surround sound, using a 
data rate for the additional channels of well over twice 
that used in those applications. 

1 .3 Video and Computer Data 

Alternatively, the buried data channel can be used 
for conveying related computer data, such as graphics, 
multilingual text, or track copyright information. Be- 
cause of the high available data rate, this can be done 
with very much higher quality than is possible on the 
subcode channels of CD, conveying, for example, with 
JPEG image data compression on the order of one high- 
quality color photographic image per second. A data 
rate of 350 kbit/s is even enough to convey a reasonable 
video image. Using the exisring MPEG standard, this 
would have very low resolution (although certainly good 
enough for moving inserts within a still image), but near- 
future image data-compression methods based on using 
the highly non-Gaussian nature of images are expected 
to make consumer-quality video available within this 
data rate. 

1.4 Dynamic-Range Data 

Another use would be to convey dynamic-range reduc- 
tion or enhancement data, such as a channel conveying 
the setting of a gain moment by moment. This would 
allow the same CD to be played automatically with dif- 
ferent degrees of dynamic compression according to the 
environment by choosing the gain adjustment channel 
appropriate for ihat environment. This would include 
the possibility of completely uncompressed quality for 
high-quality use. without making the CD incompatible 
for more normal use. such as in broadcasting. An advan- 
tage of providing the dynamic-range gain data in the 
data subchannel rather than usin^ automated dynamic- 
range modification algorithms is that one can always do 
a much belter subjective job usini: manual intervention 
bused on a knowledge of the music jnd its needs, but 
at the expense only of considerable time and effort. This 
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effort can be recorded for consumer use in the buried 
data channel. If automated algorithms are used for the 
dynamic-range gain conveyed by the buried data chan- 
nel , these can be of a much more sophisticated and subtle 
nature than those normally available to the consumer 
(for example, [181). 

1.5 Frequency Range Extension 

A further use related to the original audio would be to 
add in the subchannel data-reduced information allowing 
information above 20 kHz to be reconstructed. (See, for 
example, Komamura [19], who uses buried data for this 
purpose, but not subtractive buried data.) One of the 
limitations of CDs is that the frequency range is limited 
to 20 kHz. Although the ears' sine-wave hearing is for 
all, except a small minority of (generally young and 
often female and/or asthmatic) listeners, limited to be- 
low 20 kHz, this does not mean that there is no loss of 
perceived quality caused by the sharp band-limiting to 
20 kHz. It is widely noted that there is a significant 
loss of perceived quality when comparing high-quality 
digital signals sampled at, say, 44.1 kHz as compared 
to 88.2 kHz. 

From a quality viewpoint it may be more important 
to use an extended bandwidth to provide a more gentle 
rolloff rate than to provide a response flat to 40 kHz 
since, unlike the brickwall filters used with ordinary CD. 
such gentler rolloffs are similar to those encountered in 
natural acoustical situations. 

The extended bandwidth can be provided by using a 
high-order complementary mirror filter pair of the kind 
described in RegaJia et al. [20] and in Crochiere and 
Rabiner [21] to split an 88.2-kHz-rate sampled digital 
signal into two bands sampled at 44.1 kHz. The filters 
involved will overlap, although using a high-order filter 
f20], the region of significant overlap can be reduced to 
about I kHz. Within the overlap region there will be 
aliasing from the other frequency range, although the 
reconstruction of the full bandwidth [20], [21] will can- 
cel out this aliasing. The band below 22.05 kHz can then 
be transmitted as the conventional audio, and the band 
above 22.05 kHz can be transmitted in data- reduced 
form in the buried data channel at a reduced data rate 
of, say, between 1 and 4 bit per sample per channel, 
using known subband or predictive coding methods. 
Phase compensation inverse to the phase response of the 
low-pass filter in the complementary filter pair may be 
employed to linearize the phase response of the main 
sub- 22.05-kHz signal for improved results for standard 
listeners, with the use of an inverse phase-compensating 
filter in the decoding process of reconstructing the wider 
bandwidth signal. 

1,6 Airplay Mixes 

The buried data channels on a CD can be used to 
convey in data-reduced form alternative mixes of the 
musical material in the main track. For example, the 
buried-data audio channel might be an "airplay mix** 
designed for optimum effect when heard over AM or 
FM radios. At present such airplay mixes have to be 



distributed separately for promotional purposes, whereas 
buried data allow these versions to be distributed within 
the standard CD release. 

1.7 Remixable CD 

One potentially important use for buried data in audio 
CDs is for remixable CDs, where the end user has the 
option of changing the mix from that given by standard 
audio playback. This may be done by using the buried 
data to convey in data-reduced form additional audio 
signals, representing differences between the main- 
channel mix and alternative mixes. 

For example, in library music applications, where mu- 
sical material suitable for use as backing to radio, TV, 
audiovisual, and multimedia productions is provided on 
CD, the main stereo audio can be used for a "standard 
mix" of three stereo components. 

m x M +■ h x H + rfi 

say a melody line M, a harmony line //, and a rhythm 
line R. The buried data channels can be used to convey 
alternative mixes m z M +■ hM +• r 2 R and mjd 4- h^H 
+ r^R, where some of the mixing coefficients may be 
zero or negative. Then a new mix can be derived by 
recovering M, H, and R separately by inverse matrixing 
and then mixing them together using a conventional 
user-adjustable mixing process. 

Besides use for library music applications, the remix- 
able CD can also be used in applications where hearing- 
impaired listeners, who form a significant proportion of 
the public, can raise the level of vocal lines for enhanced 
intelligibility. Further consumer applications include 
allowing consumers to prepare theirown mixes, removal 
of "spot" microphones in classical music recordings, 
and multilingual recordings where the vocals can be 
provided in several languages- A buried audio channel 
can also be used to add or subtract vocals in music for 
Karaoke applications. 

1.8 MIDI Applications 

Another musical application would be to convey MIDI 
control data in the buried data channel. This can be used 
to control additional musical lines from MIDI modules 
as part of an overall musical mix that may be user adjust- 
able. Each MIDI channel requires a data rate of 31.25 
kbit/s. for example, allowing four MIDI channels to be 
conveyed by curling 1 Y: bit from each of the two scereo 
channels on the CD. 

1.9 Combined Applications 

Any or all of these uses can, of course, be combined, 
subject only to the restrictions of the data rate, so that 
the buried data channel could be used, for example, to 
convey one additional audio channel, a dynamic -range 
gain signal, extended bandwidth, and addirional graph- 
ics, text (possibly in several languages), copyright, and 
even insert video data, as appropriate. 

A specific judiophile example is the possibility of 
using the extra data to convey three-channel frontal- 
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stage stereo or three-channel ambisonic surround sound 
where all channels have extended bandwidth. This in- 
volves conveying one extra channel of audio in data- 
compressed form plus three channels of extended band- 
width data. 

For historical material, where the dynamic range may 
be significantly. less than 90 dB, it may even be possible 
10 increase the data rate available further by allocating 
even more bits to the buried data channel since an in- 
creased noise level may not be significant. For this rea- 
son, it may be desirable to allow the possibility of allo- 
cating as many as 12 or even 16 bits of audio data (say, 
bits 10 to 15 or even S to 15 of each audio channel) to 
the buried data channel. 

2 PSEUDORANDOM CODING OF DATA 

2.1 Pseudorandomized Data 

It is essential, if the LSBs of an audio signal are to 
be replaced by data, that the replacing data should truly 
resemble a random noise signal (albeit perhaps one that 
may be spectrally shaped for psychoacoustic reasons). 
Most data signals, when listened to as though they were 
digital audio signals, have some degree of systematic 
pattern which may well prevent them from sounding or 
behaving truly like random noise. Such departures from 
random noise-like behavior are generally much more 
perceptually disturbing or distracting than a simple 
steady noise. 

Also, if we can ensure that added data behave like a 
noise signal with known statistical properties, one can 
use alt that is known in the literature on dither and noise 
shaping (see [I l-[4|, [22l-(25D to optimize the percep- 
tual properties of the added data to minimize their au- 
dible effects. 

The data signal is rendered pseudorandom with pre- 
dictable statistics in our proposal by a data encode - 
decode process, the encode process having the effect 
of pseudorandomizing the data signal, and the decode 
process having the effect of recovering the original data 
signal from the pseudorandornized data signal, as illus- 
trated in Fig. I . From a practical point of view it is highly 
desirable that the encode and decode process require no 
use of an externa) synchronizing signal, but that the 
decode process should work entirely from the pseudo- 
randomized data sequence itself. 

The simplest way of constructing such an encode - 
decode pseudorandomizing process for data is to use a 
cyclic pseudorandom logic sequence generator sepa- 
rately on each bit, as was realized in 1967 by Savage 
{26 1 and implemented in a commercial data-scrambling 
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Fig. I . Pseudorandom encoding and decoding of data transmit- 
ted via CD channel to ensure noise-like behavior. 



product by Hewlett-Packard in 1971 127]. For example, 
if its input is zero. Fig. 2(a) shows a well-known binary 
pseudorandom logic sequence generator using feedback 
around three logic elements and a total shift register 
delay of 16 samples. (A one-sample delay is denoted by 
the usual notation z"'.) Provided that the logic state in 
the 16 samples stored in the shift register is not all zero, 
this binary sequence generator has the 16 logic states 
cycle through all 2 16 - 1 = 65 535 nonzero states in a 
pseudorandom manner. 

If, instead of using a zero input, the pseudorandom 
sequence generator of Fig. 2 is fed with a binary data 
stream s„, then it has the effect of a pseudorandomizer 
for the input data. This encoding scheme is based on 
the recursive logic 



16 



(1) 



where /„ is the output binary logic value of the network 
at integer sample time n, s M is the input binary logic 
value of the network at integer sample time n, and © 
represents the logic exclusiveor or Boolean addition 
operator (with truth table 0 © 0 = 1 © I = 0, 0 © 1 
= 1 © 0 - 1). 

Conversely, if exactly the same arrangement of logic 
gates is fed with the pseudorandomized data r„, then the 
effect of the exclusive or gates on the input signal is 
to restore the original data stream. This is achieved by 
the inverse decoding logic process 



l n - i © t„ _ j © t„ _ ] 4 © '„ _ | 



(2) 



illustrated in Fig. 2(b). 

Thus by using a logic network recursively with delay 
of a total of L - 16 samples and only four excxusive-or 
gates, a binary data stream can be pseudorandomized, 
and the same network can decode the data stream back 
to its original form. For constant signals there is a one 
in 65 536 chance that the undesirable nonrandom zero 
state will be encountered, but this low probability is 
probably acceptable, given that even a single binary digit 
change of the input is likely to "jog" 1 the system back 
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Fig. 2. Binary pseudorandom sequence generator using >hirt- 
rcgistcr logic, with input EXCLUSIVE-OR gate for encuding 
and decoding of binary data stream. 
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into a pseudorandom output state. 

Other well-known pseudorandom binary sequence 
logic generators with shift registers of length L longer 
than 16 samples can be used for encoding and decoding 
in the same way, with their fed back output given by 
subjecting the delayed sequence output and the input to 
as sum logic gate. Such length L sequences will have, 
for a constant input, only one chance in 2 L - 1 of giving 
an unrandomized output, and will have a sequence 
length of 2 L - 1 samples. 

Although the pseudorandom binary sequence genera- 
tor described in Eq. (I) and Fig. 2 is a maximum-length 
sequence for a zero input, it has a shorter length for an 
all-1 constant input, and in general, the precise behavior 
with, say, periodic inputs is hard to predict. Partly for 
this reason it is not absolutely essential to use a maxi- 
mum-length sequence generator, provided that the 
length of the sequence is not too short for constant 
inputs. 

It will be noted that the network of Fig. 2 only has 
L = 16 samples of memory, so that when used as a 
decoder, any data errors in the input will only propagate 
for L samples, and then the output will recover. This 
lack of long-term memory in the decoding process means 
that there are no special requirements on the error rate 
of the transmission channel. Because of the small num- 
ber of logic elements in Fig. 2, a single sample error in 
the received data stream will only cause five sample 
errors in the decoded output. 

As shown in Fig. 3, typically, for use with CDs, the 
data will first be arranged to form a number of bits of 
data per sample of each audio channel, for example, 8 
bit of data constituting bits 12 to 15 of the left and right 
audio channels [where bit 0 is the most significant bit 
(MSB) of a 16-bit audio word and bit 15 the LSB1. 

Then each of these (say 8) bits will, separately, be 
encoded by a pseudorandom logic such as that of Fig. 
2 to form a pseudorandom sequence, and the resulting 
pseudorandomized bits will be used to replace the origi- 
nal bits in, say, bits 12 to 15 of the left and right audio 
channels. The resulting noise signals in the left and right 
audio channels will be termed the (left and right) data 
noise signals. 

Alternatively, instead of pseudorandomizing individ- 
ual bits of the audio words representing data separately, 
they can be pseudorandomized jointly by regarding the 
successive data bits of a word as being ordered sequen- 
tially in time, and applying a pseudorandom encoder 
such as that in Fig. 2 to this sequence of bits. For exam- 
ple. 8 bit of data per audio sample can be ordered sequen- 
tially before the next 3 bit of data corresponding to the 
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next audio sample, and the pseudorandom logic encod- 
ing can be applied to this time series of bits at eight 
times the audio sampling rate. 

An advantage of this srrategy is that errors in received 
audio samples propagate for (in this example) only one- 
eighth of the time, as in the case where each word bit 
is separately pseudorandomized. 

AMevel data signals, taking one of M possible values, 
conveying logyt/ bits per sample, can also be pseudo- 
randomized by a direct process involving congruence 
techniques, whereby the coded version w' n of the current 
sample M- level word w M is given by 



+ 2 (mod M) 



(3) 



where the Oj$ are (modulo M) integer coefficients chosen 
(if necessary by empirical trial and error) to ensure that 
all M possible constant inputs result in a pseudoran- 
domized output with reasonably long sequence lengths. 
The inverse decoding of the pseudorandomized M-Ievel 
words is 



(4) 



Fig. J. Schcmjtic of processing of data to form audio noise- 
like signal. 



The logic techniques described with reference to Fig. 
2 are just the special case when M - 2 of this more 
general congruence technique. The congruence tech- 
nique can result in sequence lengths for constant inputs 
of length up to a maximum of M L - 1 samples, so that 
in general the larger the value of M, the smaller L need 
be with a consequent shortening of the time duration of 
propagation errors. 

A slightly more complex pseudorandomization of data 
will provide an initial pseudorandomization of AMevel 
data by a method such as one of those described here, 
and follow jt by an additional one-to-one map between 
the M possible data values. The decoding will first sub- 
ject the M levels to an inverse map before applying the 
inverse of the above pseudorandom encodings. 

There are many similar but more complicated methods 
of pseudorandomization of data streams. As we have 
seen these need have no coding delay or increase in data 
rate after coding, and they can limit the duration of any 
errors in received data in the inversely decoded output 
to not more than a few samples after the occurrence of 
an erroneous audio sample. 

As audio signals, the resulting pseudorandomized data 
noise signals have a steady white-noise spectrum and a 
(discrete) uniform or rectangular probability distribution 
function (PDF) in the example case described, having 
16 levels in each of the left and right channels. Such 
discrete noise does not have the ideal properties of rect- 
angular dither noise, although Wannamaker et al. [22] 
have shown that it approximates many of these desirable 
properties in a precise mathematical sense. However, 
adding to it an extra random or pseudorandom white 
rectangular PDF noise signal with peak level ±'Jz LSB 
converts it into noise with a true rectangular PDF. with 
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peak levels (in this example) of ^8 LSBs. In this case 
the added noise to convert from a discrete to a continuous 
PDF is at a very low level, being 24 dB below the level 
of the data noise signal. 

2.2 Stereo Parity Coding 

Although in the preceding example we described data 
being conveyed separately on each audio word bit of the 
data signal, it will be realized that data can alternatively 
be conveyed by more complicated combinations of the 
LSBs of audio words (in any numerical base M, not just 
the binary base 2) — for example, on the Boolean sum of 
the corresponding bit in the left and right audio signals. 

For example, consider the case where a data rate of 
only 1 bit per stereo audio sample is required. Such a 
signal can be conveyed as the Boolean sum of the LSB 
in the left and right audio channels, leaving the values 
of the LSB in individual audio channels separately un- 
constrained. Conveying a data channel using the Bool- 
ean sum of the corresponding bits of the left and right 
audio signals is herein termed stereo parity coding. 

It is of course desirable that the effect on the conven- 
tional audio of reallocating bits to a buried data channel 
should be left-right symmetrical. In particular, if a bur- 
ied data channel is used with a data rate of just 1 bit per 
stereo sample (BPSS), then one does not wish to code 
the data in the LSBs of only one of the two stereo chan- 
nels. If the values of the respective Nth bits of the respec- 
tive left and right channel signals are denoted by V* n and 
Rr 1 ^ at time «, then one codes a pseudorandomized l r 
bit per sample data channel as 

r? = £.?©/??. (5) 

This encoding can be accomplished by flipping, if neces- 
sary, the parity of the Nth bit in either of the two stereo 
channels to ensure the desired value of r^. The added 
error noise is minimized by flipping that channel whose 
quantization was closer to a decision threshold, as de- 
scribed more fully in Section 6.3. 

If desired, an additional second pseudorandomized 1- 
bit per sample data channel' u*^ can be encoded in the 
Nth bit of the stereo audio signal, say. as 

■* = Li (6) 

in which case the data can be encoded via L y „ = R 1 * 
= L* © and decoded via fr v = L s n . r v = L* S R*. 
Alternatively w^can be encoded as R*. The use of stereo 
parity encoding allows the separate 1-BPSS data chan- 
nels to be decoded separately while maintaining left- 
right symmetry in the audio when an odd number of 1- 
BPSS channels is used. 

One could standardize a basic 1-BPSS data channel 
as being conveyed via the parity (Boolean sum) of the 
LSBs (that is. bit 15) of the left and right audio channels. 
Information about the way other data channels convey- 
ing more BPSS are coded will, in such a standardization, 
be conveyed by this basic dau channel. By this means, 
a data decoder can read from the basic 1-BPSS stereo 
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parity data channel how to decode other data channels 
present, if any. In particular, this allows, if desired, 
moment-by-moment variation of the data rate, cither 
adaptively to the amount of data needing transmission 
oradaptivcly to the audio signal according to its varying 
ability to mask the error signal caused by the hidden 
data channels. 

For example, in loud passages in pop or rock music, 
the data rate allocated to, say, a video signal could be 
increased, allowing quite high-quality video images in. 
say, heavy metal music. 

2.3 Fractional Bit Rates 

There is no reason why the buried data channels 
should be restricted to data rates of an integer number 
of BPSS. although this may be a convenient implementa- 
tion. Several methods can be used to allocate less sig- 
nificant parts of audio words to data at fractional bit 
rates. 

One method conveys \og?M bits for integer M in the 
less significant parts of audio words by conveying data 
in the Af possible values of the remainder of the integer 
audio word after division by M, whereas the rounding 
quantization process used for the audio involves 
rounding to the nearest multiple of M . For M a power 
of 2, this reduces to conventional quantization to log 2 Af 
fewer bits. 

In Eqs. (3) and (4) we described how such M-level 
data channels can be pseudorandomized by pseudo- 
random congruence encoding and decoding. Alterna- 
tively, if A/ can be expressed as a nontrivial product of 
K = 2 or more integer factors, M - n%t M r then one 
can uniquely expand the M level data word w in the form 

*-t * 

with w )4 , an integer between 0 and M k *\ ~ I - Eq. (7) 
is the generalization of the expansion of a number to 
base jV/ 0 in the case Mj » A/ 0 for all j - I ..... K. 
Each of the expansion coefficients w {kJ can, if desired, 
be pseudorandomized separately before the final length 
M word is formed. Again, this generalizes the binary 
case described where the A/ y equaled 2. 

A second method for fractional bit rates especially 
suitable for very low data rates of \iq BPSS for integer 
q is to code data only in one out ot' every q audio samples . 
The encoding schemes are as before, but with a data 
sampling rate divided by q. and decoding involves the 
decoder trying out and attempting to decode each of the 
q possible subsequences until it rinds out (for example, 
by confirming a parity check encoded into the data) 
which one carries data. 

For integers p < q a data of p*q BPSS can similarly 
be obtained by encoding dau in the LSBs of p out of 
every q samples (for example, samples 1 and 3 out of 
every successive 5 samples for p ~ 2 and q = 5). 

A third method for fractional bit rates also codes data 
in the LSBs oft/ successive samples, but codes the data 
into different logical combinations of all q bits. For 
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example, a data rate of \iq BPSS can be obtained by 
encoding data as the parity (Boolean sum) of the q LSBs 
It turns out that this option is often capable of signifi- 
cantly less audio noise degradation than the simpler 
scheme of the second method. A part of the advantage 
is that if one needs to modify the parity, then one can 
choose to modify that sample out of the q successive 
samples that will cause the least error in an original 
high^resolution audio signal, rather than being forced to 
alter a fixed sample. 

We shall see in the following that, for all three kinds 
of fractional bit rate data encoding, it is possible to use 
a subtractive dithering technique by a data noise signal 
to eliminate unwanted modulation noise and distortion 
side effects on the modified waveform data. The advan- 
tages of the subtractive buried data process are not con- 
fined to integer bit rates per sample. 

3 SUBTR ACTIVELY DITHERED NOISE SHAPING 

3.1 Subtractive Dither 

Here we briefly review the ideas of subtractively dith- 
ered noise shaping, detailed by the authors in [1], [3], 
and [4J. In this paper, by a "quantizer" we mean a signal- 
rounding operation that takes higher resolution audio 
words and rounds them off to the nearest available level 
at a lower resolution. We assume that the quantizer is 
uniform, that is, the available quantization levels are 
evenly spaced, with a spacing or step size denoted by 

^ 1 tr. 

The quantizer rounding process introduces nonlinear 
distortion, but this distortion may be replaced by a be- 
nign white-noise error at (he same typical noise level by 
using the process of subtractive dither shown in Fig. 4. 
The process comprises adding a dither noise before the 
quantizer and subtracting the same dither noise after- 
ward. Provided that the statistics of the dither noise arc 
suitable it can be shown (see [1]. [2]) that this results 
m the elimination of all correlations between the error 
signal across the subtractively dithered quantizer and the 
input signal. One such suitable dither statistic is what 
we term RPDF dither, that is. Hither where each sample 
is statistically independent of the other samples and with 
a rectangular probability distribution function having 
peak levels of ± l h STEP. 

An audio word of B bits, each of which is a pseudoran- 
dom binary sequence, is a 2*-level approximation to a 
signal with RPDF statistics, so that the dam noise signal? 
considered earlier may be used as dither signals for dith- 
ering audio to eliminate nonlinear quantization distor- 
tions and modulation noise. Similarlv. the AMevel data 
noise signals described in Section 2.3 usinsj the remain- 



der modulo M for data, if made to be of a pseudorandom 
form by a pseudorandom data encoding-decodin* pro- 
cess, can be used as an Atf-level approximation to RPDF 
noise. 

Although data noise signals are discrete approxima- 
tions to RPDF noise, they can be convened to continuous 
noise statistics by the simple process of adding 
to them an additional smaller RPDF noise with peak 
levels i'/ 2 LSB, where LSB is the step size of the LSBs 
o; the transmitted audio words (as distinct from the step 
size STEP = M LSBs of any rounding process used in 
encoding hidden data channels). This is shown schemati- 
cally in Fig, 5, 

Conventionally, as described in [1] and [3] use of 
subtractive dither requires the use of a decoding'process 
dUfinS P' a y back ' rne original dither noise 
added before the quantizer is reconstructed before beine 
subtracted. This requires either the use of synchronized 
pseudorandom dither generation algorithms, or an encode - 
decode process in which the dither noise is generated 
from the LSBs of previous samples of the audio signal 
[3J. However, in the application of this paper, as wii] 
be seen, no special dither reconstruction process is re- 
quired for the discrete dither since this is already present 
in the transmitted LSBs. 

3.2 Noise Shaping 

A white error spectrum is not subjectively optimum 
for audio signals, where it is preferred to weight the 
error spectrum to match the ears' sensitivity to different 
frequencies so as to minimize the audibility or perceptual 
nuisance of the error. The spectrum of the error signal 
may be modified to match any desired psychoacoustic 
criteria by the process of noise shaping, discussed, for 
example, in [1], [4], [23]~[25]. 

Noise shaping may be static (that is, adjusting the 
spectrum in a tirne-invariant way) and made to minimize 
audibility or optimize perceptual quality at low noise 
levels, or alternatively it can be made adaptive to the 
audio signal spectrum so as to be optimally masked bv 
the instantaneous masking thresholds of audio sisnafs 
at a higher level. The latter option is particularly valu- 
able m the present application, where loud audio 'signals 
may we]] allow an increased error energv to be masked 
thereby allowing a higher data rote to be transmitted in 
the hidden data channels during loud audio passages. 
The form of noise shaping with subtractive dither' used 
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in this paper is indicated in the schematic of Fig. 6. It 
will be noted that, while it is equivalent to some of the 
forms described in [I), it is not the arrangement de- 
scribed previously by the authors in [3] in that here we 
put the noise-shaping loop around the whole subtractive 
process. With the arrangement of Fig. 6 the output of 
the quantizer itself differs from the noise-shaped output 
of the whole system by a spectrally white dither noise, 
so that in this arrangement, unlike those suggested in 
[3], the spectral shapes of the quantizer output error and 
the system output error are not identical. 

V/ith the noise-shaped subtractively dithered quan- 
tizer of Fig. 6 the error feedback, filter fid" 1 ) must 
include a one-sample delay factor z~' in order to be 
implementable recursively, and the originally white 
spectrum of the subtractive/y dithered quantizer is fil- 
tered by the frequency response of the noise-shaping 
filter, 



1 - H(z~ l ) 



(8) 



which is preferably chosen to be minimum phase to 
minimize noise energy or a given spectral shape [ I J, and 
may be chosen to be of any desired spectra] shape. 

Other implementations of noise shaping around a dith- 
ered quantizer system are possible. Alternative imple- 
mentations are reviewed in [4 J. By way of example, Fig. 
7 shows an alternative "outer" form of noise-shaping 
architecture described in [4], which is equivalent to Fig. 
6 if one puts 



I - HU~ l ) 



(9) 



linear distortion or modulation noise, provided that the 
dither noise added in Fig. 6 or 7 is RPDF dither matched 
to the step size STEP of the quantizer. 

4 APPLICATION TO 8UR/ED DATA CHANNELS 

4.1 Noise-Shaped Subtractively Dithered 
Buried Channel Encoding 

Either the arrangement of Fig. 6 or that of Fig. 7 can 
be applied to obtain subtractively dithered noise-shaped 
audio results when ihe last digits of an audio signal word 
(whether the last N binary digits or the remainder after 
division by M) are replaced by buried data bits. 

The procedure is now simple to describe. The data 
are first pseudorandom ized and then used to form a data 
noise signal as described. This data noise signal has 
(discrete M-level) RPDF statistics, and may be used as 
the dither noise source in Fig. 6 or 7. This is shown in 
Figs. 8 and 9, where the quantizer is simply the process 
of rounding the signal word to the nearest integer multi- 
ple of M LSBs, or the nearest level if the levels are 
placed uniformly at other than the integer multiple of Af 
LSBs. The process shown in Fig. 8 or 9 subtracts the data 
noise signal from the audio at the input of the uniform 
quantizer (which has step size STEP = M LSBs) and 
adds it back again at the output of the quantizer so as 
to make the least significant digits of the output audio 
word equal to the data noise signal. Noise shaping is 
performed around this whole process - 

For best results using the algorithms of Figs. 8 or 9 
(or equivalent algorithms such as that in Fig. 10), it is 
best if the input audio word signal is available at a higher 



The application of noise shaping around a subtractively 
dithered quantizer will not result in any unwanted non- 
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resolution or word length than that used in the output 
since this will avoid cascading the rounding process used 
tn Fig. 3 or 9 wnh another earlier rounding process By 
making the input signal available at the highest possible 
resolution, any overall degradation of the signal-to-noise 
ratio is minimized. 

Since the output equals the output of the quantizer 
plus the data noise signal, the noise shaping has no 
effect on the information representing the data in the 
output aud,o word, but merely modifies the process 
by which the quantization of the audio is performed 
so as to minimize the perceptual effect of the added 
data noise on the audio. It is remarkable that this out- 
put s,gnal, being the output of a noise-shaped sub- 
tractivcly dithered quantizer, automatically incorpo- 
rates all the benefits of noise-shaped subtractive dither 
without the audio-only listener needing any special 
subtractive decoding apparatus. 

Moreover, because the information received by the 
data channel user is not dependent on the noise-shaping 
process, the noise shaping can be varied in any way 
desired without affecting reception of the data, provided 
only that no overflow occurs in the noise-shaping loop 

TJh w ai f° - leVC,S ' (HttinS 3 eU *P« in the *W« 
pa h before the quantizer to prevent this may be desir- 
able.) Thus the noise-shaping process does not affect 
the way the signal is used by either audio or data end 
users of the signal, and so does not need any standardiza- 
tion, but may be used in any way desired by the encoding 
operative to achieve any desired kind of static or dy- 
namic noise-shaping characteristic. 

Other equivalent noise-shaped dithering architectures " 
may be used in piace of those shown in Figs. 8 and 9 
for encoding the data signals into the output audio word 
using the kind of equivalent architectures discussed in 
[4J. Purely by way of example. Fig. 10 shows yet an- 
other implementation having performance identical to 
that shown in Fig. 8 or 9. It is also evident that in a 
similar way, the data noise signal can be added and 
subtracted outside the "outer" noise shaper of Fig 9 
rather than inside the noise shaper as shown 

i » 

4.2 Buried Channel Decoding 

Optimum recovery of the audio channels involves no 
need tor any kind of decoder in this proposal. Playback 
is conventional with the effect of subtractive dither bv 
the data noise signal being automatic, as described ' 



Recovery of the buried d a(a is also s.raigh.forward 
Simply being recovery of the da.a noise signal bv re' 
jeemg the highest bits of the received audio word In 
(he case of ,VMevet data, the inverse process to the en 
codtng m ay be used, namely, reading the remainder of 

WOr « d 3f,er diviSi0n by M - Ihat is - solving 
the least stgn.ficam digits of the audio word via modulo 
M anthmet.c. This is followed by the inverse pseudoran- 
dom decoding process to recover the data before pseudo- 
randomuat.on. and then the data are handled as data in 

callyTn F r g ay i ™ S deCOding b Sh0w " 

In the case where the data are encoded as integer 
coefficems with more than one base A/, as in L 
(7). the data are recovered by K successive divisions by 
M, to at each stage discarding the fractional part 
the K coefficients w M being the integer remainders of 
he div,,,on by M k ^. This .s the same process as that 

dWiTon 1 " U • bU ' W ' th * " ageS ° f the modu, ° 
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5 VECTOR QUANTIZATION AND DITHER 
5.1 Reasons for Digression 

It may not be completely clear to the reader without 
further explanation that the preceding descriptions of the 
use of noise-shaped subtractive dithering apply co the 
case of stereo parity coding as w C lL To see this, we first 
need to look at vector quantization and vector ditherine 
and to show that the exact same ideas for subtractive 
dithering, noise shaping, an d data encoding can be ap- 
phed to the vector quantizer case as to the scalar case 
described earlier. Because the description in this section 
may be found rather technical. w e suggest that it be 
omitted on first reading. 

The description here is given in greater generality than 
needed just for the stereo parity coding case, smce it 
has applications to coding information in the parity of 
the corresponding bits in three or more channels in trans- 
mission media carrying more than two audio or image 
channels as, for example, in the three channels con- 
taining the three components of a color image. 

5.2 Uniform Vector Quantizers 

As brierly indicated in earlier papers [l| f3] nil 
the concepts of additive and subtractive dither can be 
applied to vector as w C Jl as scalar quantizers. Vector 
quantizers quantize a vector signal v comprising n scalar 
S u gna t- V| - v,> m geometrical regions covering 
the ^dimensional space of « real variables. As in the 
scalar case, we shall say that a vector quantizer Q is a 
uniform quantizer if the signal v is quantized to a point 
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in a discrete grid C of quantization vectors {y g : y£G} 
where there exists a region C around (0, . . . , 0) of n- 
dimensional space such that the regions y s + C = 
{y s + c: c E C} cover without overlap (except at their 
boundary surfaces) the range of the signal variables y 
being quantized. Thus a uniform vector quantizer di- 
vides the /t-variable space into a grid of identical vector 
quantization cells that are translates of the cell C to the 
points of the grid G, and quantizes or rounds any point 
in the cell y g + C to the point y $ . 

There are many examples of uniform vector quantiz- 
ers. The simplest has a hypercubic cell C = the region 

{<<:,, . . . , cj: |cj =s 'A STEP V i - 1 n}. that 

is, separate scalar quantization of the n variables. The 
grid G in this case consists simply of the points of the 
form (m,STEP, m ; STEP, .... m„STEP) for integer m j% 
and the associated vector quantizer is simply the one 
that takes iy x . - . - , y„) to m ; = round (y/STEP) fory 

= l n, where "round" takes a number to the 

nearest integer. This case is trivial in the sense that it 
is equivalent to using separate uniform scalar quantizers 
on each of the n channels. 

A more complicated but easily visualized example is 
the two-channel case where C is a regular hexagon in 
the plane as, for example, the region consisting of points 
(£,, C2) in the plan«. such that 
|c,|*£' V2STEP, 

V3 

I -Vac, + -y=c 3 | « V*STEP, 

Here the grid G is the centers of the hexagons in the 
honeycomb grid covering the plane, that is, G is the set 
of the points 

((«, + '/zm^STEP, ~L 2 STEp) ( ( {) 

for integers /n, and /n 2 - ' 

A uniform vector quantizer of particular interest and 
practical use in n dimensions is what we shall term the 
rhombic quantizer. This starts off with a conventional hyp- 
ercubic grid G c of points at positions (m t STEP. m-STEP. 

. . . , m„STEP), where STEP is a srep size and m, 

m„ are integers, which, of course, includes the hypercube 
quantizer cell just described and corresponds to the use of 
n separate scalar uniform quantizers. However, we then 
produce a new grid G C G c , which consists of just those 
grid points in G c with m, + -■•+■ m n having even integer 
values. This new grid only has half as many points as 
the original, and it can be equipped with a new vector 
quantization cell C as follows, which we shall term the n~ 
dimensional rhombic quantizer cell. 

The rhombic quantizer cell can be described geometri- 
cally by thinking of the original hypercubic cells as being 
colored white if m l + ■ - * +■ m„ is even and black if 
m, -r • - -r- m H is odd. forming a kind of n-dimenstonal 
checkerboard pattern of alternately black and white 
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hypercubes. Then attach to each white hypercube that 
"pyramid" portion of each adjacent black hypercube ly- 
ing between the center of the black hypercube and the 
common "face" with the white hypercube. The resulting 
solid is the rhombic cell C. 

Ir is evident, since together all the pyramid portions 
taken from adjacent black hypercubes are enough to 
form one black hypercube if pieced together, that the 
volume occupied by the rhombic quantizer cell is twice 
that occupied by the original hypercube quantizer cell, 
and that the versions of the rhombic quantizer cell trans- 
lated by grid G indeed cover the n-dimensional /i-param- 
eter vector signal space. 

For n - 2 the rhombic quantizer cell C is a diamond 
shape, being a square whose sides are rotated 45° relative 
to the channel axes, as shown in Fig. 12. For n = 3 the 
rhombic quantizer cell C is a rhombidodecahedron, a 
12-faced solid whose faces are rhombuses. For n = 4 
the rhombic quantizer cell C is a regular polytope unique 
to four dimensions, termed the regular 24-hedroid [28]. 

Calculations involving quite complicated multidimen- 
sional integrals, which we shall not detail here, show, 
for a given large number of quantizer cells covering a 
large region of n-dimensional space, that for n = 2, 
rhombic quantization has the same signal-to-noise ratio 
as conventional independent quantization of the chan- 
nels, but that for n 3, rhombic quantizers give a better 
signal-to-noise ratio than conventional independent 
quantization of the channels. The improvement reaches 
a maximum of about 0.43 dB when n = 6. This improve- 
ment in signal-to-noise ratio is maintained when additive 
or subtractive dither is used as described hereafter. (The 
hexagonal two-channel quantization described earlier 
gives a 0. 16-dB better signal-to-noise ratio than inde- 
pendent quantization of two channels.) 

Mathematically, the rhombic quantizer has grid G 
consisting of the points 

(/m 1 STEP, A* 2 STEP m n STEP) (12a) 

where the m % have integer values with 

m, -f- • - + m„ having even integer values. (I2b) 




Fig. 12. Two-dimensional rhombic quantizer region (.shaded 
square with sides tilted 45°) sho**n aejinst a" background 
(squares with horizontal and vertical side>> of conventional 
independent quantizers (whose square quantizer region is 
darkly s haded > on channels y, and ? z . 

'0 
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The rhombic cell C is that region of poincs (c, 
O that satisfies the n{n - \) inequalities ' ' * ' 

k, + c,j * STEP. k-cJ-sSTEP (13) 

for / * j selected from 1 «. The associated uniform 

vector quantizer rounds a vector signal (y lt . y)bv 
an algorithm whose outline form might be * " * 

m\ := round(jv/STEP) 

If m\ + • - . + m f n ; s cven 

then m t := m \ for ail i = ( rtf 

else c, v/STEP - 

<*> dj : - sgn(c.) if |c y | > | C J f or all i < J and 
W 58 hi for all / > , 

4 : * 0 for all other i, 

m i -~ m\ + d t for all i = 1 

End if. (U) 

There are, of course, various equivalent forms for this 
kind of rhombic quantizer algorithm, a computationally 
demanding aspect on typical signal processors being the 
determination in line (*) of that," for which \ Cj \ is biggest 
In the n - 2 case there is a simpler rhombic quantiza- 
tion algorithm as follows: 



modifying an input vector signal, where the uniform 
quantizer becomes a vector uniform quantizer with quan- 

PD^ 1 vector d ih 311 " ^ n ° iSe beC ° meS 3 u ^°- 
vector signal of the system is free of all nonlinear distor- 
tion and modulation noise effects (that is. the first mo- 
ment of the output signal error is zero, and the second 
moment » independent of the input signal [4]). More- 
over .this is still the case if any statistically independent 
additional noise is added to the uniform PDF dither noise 
r c on the region C. 

^ZT e Vu lse shaping can be a pp lied aroun <J ^ch 

subtracts dither in exactly the same way as shown in 
Figs. 6 and 7, or in equivalent noise-shaping architec- 
tures, the only difference being that any filtering i s now 
applied to n parallel signal channels. It is also possible 
^desired, to use an „ x n matrix error feedback niter* 
fiU ) or H (z~ ) m order to make the noise shaping 
dependent on the vector direction, for example, to opti- 
mize directional masking of noise by signals f 1 1] [J2j 
It is possible to generate uniform PDF vector dither 

!£ Vn thC rh ° mbic CCl1 C *y an al gorithm suc h as 
the following. First generate, for example, by the well- 
known congruence method n statistically independent 
rectangular PDF dither signals r, (i = l w - th 
peak values ± -/.STEP, an d also generate an additional 
two-valued random or pseudorandom signal u with value 
either 0 or 1. Then the values of the noise signal r c = 
(v,, . . , v„) are given by 

If u = 0 



m\ : = rounder, /(V2STEP)) 
m\ : = round(jr 2 /( V2STEP)) 



then v r,- for all t = 1 



— m s — m'^ 
i 



(15) 

which is based on the observation that the rhombic quan- 
tizer ceil for n = 2 has the same shape as the square 
cell used for ordinary independent quantization of the 
two channels, but rotated by 45'_and with an increase 
of the step size by a factor of v'2 (Fig. 12). 

5.3 Subtractive Vector Dither 

The concepts of dithering developed in ni-{4] for 
scalar un.torm quantizers may also be applied to the 
vector case by using appropriate vector dithers. An 

signal d.ther noise vector vJ is said [Q havc 

a uniform probability distribution function (PDF) in a 
reg.on C of "-dimensional space if its joint probability 
distribution function is constant within the region C and 
zero nurs.de it. This is the -dimensional generalization 
or rectangular PDF dither for vector sisals, and we 
denote the associated /.-vector dither signal by r c 

It can be shown (w e omit jny proofs here) that if 
the subtractive dither arrangement of Fig. 4 is used tor 



else dj : = sgn(r y ) if > | r< | for a]1 - < - ^ 
k>l ^ | n\ for all i>j 

dj - 0 for all other i. 



v, : - r,- - di STEP for all i = j 



n. 



End if. 



(16) 



However, in applications of subtractive dither, this algo- 
rithm may involve unnecessary complication, since it 
can be shown that with the subtractive dither arrant 
me/it ot Fig. 4 with a uniform vector quantizer with 
quantization cell C, a uniform PDF vector dither signal 
r 0 may be used for any other uniform quantization cell 
D sharing the same grid G. and it will still eliminate 
nonlinear distortion and modulation noise in the output 
Whatever the shape of the other quantization eel! V used 
tor the dtther signal, the resulting error signal from the 
subtracti ve dither arrangement of Fig. 4 is I noise signal 
*«h uniform PDF statistics on the quantizer cell C of 
the uniform vector quantizer used. 

This can allow a much simpler algorithm to be used 
tor generating the vector dither in which « STEP is added 
to (or subtracted from) just one of the n rectangular PDF 
no,se components. For example, a uniform PDF vector 
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dither noise signal r D = (v, v m ) given by 

v, : r, - u STEP 

v t . := r t for (' = 2 n . (17) 

may be used to subiractiycly dither this rhombic quan- 
tizer. 

5.4 Nonsubtractive Case 

Although we shall not need to use the nonsubtractive 
vector dither case in the hidden data channel application 
of this paper, it is easy to note the extension of the 
preceding to the nonsubtractive case. As in the scalar 
case reported in [2], it can be shown that a uniform 
vector quantizer with quantizer cell C can be made to 
give an output suffering from no nonlinear distortion 
or modulation noise if dither noise is added' before the 
quantizer that has the form of the sum of two statistically 
independent uniform PDF vector dithers, each of the 
form r c over the region C. 

Such a dither is a vector analog of the triangular PDF 
dither [2] used in the scalar case, and may similarly be 
subjected to noise shaping of the dithered vector quan- 
tizer without introducing nonlinear distortion or modula- 
tion noise effects. As in the scalar case, such nonsub- 
tractive dithering with no modulation noise gives a noise 
energy three times as large as does substractive dithering. 

6 REFINEMENTS OF BASIC PROPOSAL 

6.1 Further Developments 

The encoding process described will work well as 
it stands, but it does not incorporate various desirable 
refinements, which we shall now describe. These include 
methods to rake account of the fact ihar the data noise 
signal has a discrete and not a continuous PDF dither, 
and applications involving stereo parity coding. 

6.2 Nondiscrete Dither , 

The fact that the dither given by the data noise signal 
has an Af-level discrete probability distribution function 
rather than a continuous RPDF means that there is still 
unwanted quantization distortion at the level of the LSB 
of the audio word which is not properly dithered. Pre- 
ferred methods of adding "nondiscrete" dither (or. 
strictly speaking, dither at a significantly high arithmetic 
accuracy such as implemented using 24- or 32-bit arith- 
metic) are now described. The method of adding such 
dither shown in Fig. 5 is not preferred for three reasons: 

1) Optimum playback requires subtract! ve decoding 
of the ±'/: LSB RPDF dither signal, with all the usual 
problems of implementing subtractive dither [I J, since 
unlike the discrete data noise signal , this is not explicitly 
transmitted in the audio word. 

2) The ± l h LSB RPDF dither signal added before 
the quantizer does not eliminate modulation noise in 
nonsubtractive playback, having the wrong statistics tor 
this purpose [2J. 

3) If the whole system is noise shaped as in Figs. 6 




or 7, the nonsubtractive listener will hear the ±*h LSB 
RPDF dither signal as having a white spectrum not af- 
fected by the noise shaping, and thus will perceive an 
increase in noise level. 

A correct way of adding extra dither to avoid nonlinear 
quantization distortion and modulation noise at the :rV: 
LSB level is shown in Fig. 13. The dither used has a 
triangular PDF with peak levels =: 1 LSB (so-called 
TPDF dither) with independent statistics at each discrete 
time instant, so as to eliminate modulation noise in non- 
subtractive playback [2], It is added before the quantizer 
in the noise-shaping loop, but not subtracted in the noise- 
shaping loop. This ensures that the added noise in non- 
subtractive playback is noise shaped. 

Subtractive playback of the extra dither is done, also 
as shown in Fig. 13, by reconstituting the triangular ± 1 
LSB PDF dither at the playback stage, passing it through 
a noise-shaping filter 1 - /Y(r"') and subtracting the 
filtered noise from the output audio word. Subtractive 
playback of course reduces the extra noise energy caused 
by the nondiscrete dither by a factor of 3, although this 
will only be highly advantageous in the case where the 
data noise signal has fairly low energy, such as at a data 
rate of 1 BPSS. 

The triangular dither signal may be generated, in en- 
coding, as proposed in the "autodither" proposal of (3] 
by means of a pseudorandom logic look-up table (or a 
logic network having the effect of a pseudorandom look- 
up table) from the less or least significant parts of the 
output audio word in the K previous samples, where 
typically K may be 24 , and can be reconstructed from 
the same audio word at the input of the system by the 
same look-up table or logic in the decoding stage. This 
is shown in Fig. 14 for the system of Fig. 13. 

Although Figs. 13 and 14 are shown for the particular 
noise-shaping architecture of Fig. 6, similar ways of 
adding the extra triangular dither can be used with any 
other equivalent noise-shaping architecture such as the 
outer form of Figs. 7 and 1 0 — again by adding the trian- 
gular dither just before the quantizer and subtracting it 
again, via a noise-shaping filter 1 - H(:~ l ). only at the 
output of the decoder. It is clear that the points at which 
dither signals are added can be shifted around in various 
ways without affecting the functionality. 

6.3 The Stereo Parity Case 

Suppose we have two-channel stereo signals in which 
data are encoded pseudorandom! y in bit /V for all ;V = 
15 to, say. 15 — A + I {where the integer h may 
typically be any integer from 0 to perhaps 6 or S, 0 
being the case of no bits being encoded) of the left and 
right audio words. Data are also being encoded in the 
stereo parity ( Boolean sum) of bit Id - h of the left 
and right audio words, as described in Section 2.2. 

Based on the results on uniform vector quantization 
and subtractive vector dither of Section 5. the noise- 
shaped subtractive encoding of the data described in the 
scalar case for individual audio channels may be applied 
to this case too with just two re interpretations: 

I) The uniform quantizer used in Figs. 6-10 now 
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rtT' S H two - dimen ^' rhombic quantizer 

Fio m l C ct ed alg ° ri,hm < 13 > and i""™ in 
Fig. 12) with STEP = 2* LSB. 

2) The "data noise signal" used for dithering is given 
for example, by Eq . (]7) , where 

Sthe f , 'I' * bUS ° f ,he /th - Channel audi ° 
channel right), and u being the parity of bits 15 — A of 
the left and nght audio words. In units of LSB the data 
nose signal for the left channe. is then L 0 - 2^ and 
for the nght channel it is where L a and R a are the 
respective integer words represented by the last £bta 

AnV at° W ° rd f T 6d by "* d3ta in ,he tw ° cha <">^ 
Any alternative data noise signal may be used that 

represents an appropriate uniform PDF vector dither as 

-escnbed ,„ Action 5.3. such as that given by a,g 0 ! 

efflcTs atlhe" a ' n °" line , ardist0 "i°» a «° modulation noise 
effects at the LSB level caused by the fact that the vector 
data noise is discrete rather than continuous can be e- 
rnoved by ustng exactly the same technique as that de- 
scnbed ,n Section 6.2 and Figs. 13 and 14 by adding 
PDF Si"' appr °P ria,e - '"""acting £1 LSB triangular 
PDF d„her in each channel separately, the only differ- 
ence being that the uniform quantizer has become a 

Tr^l T t0 ' <)UamiZer and ,he da,a noisc "as 
a modified vector form, as just described 

J ' hC P articuIar * = 0, where data are transmitted 

chain pa T y ° f the LSB of the audi ° ™ d "> 
channels imply uses the parity signal itself at the LSB 

in the encoding process- it does not matter which of 
the two channels ,, chosen. With subtractively dithered 
playback ,t turns out that the use of properly designed 
stereo parity coding of data, using a rhombic vector 

level r^'r the r oding a «** ■»£ 

level I dB lower than would be obtained by the process 
of codmg the data into the LSBs of the words of jus 
one of the two audio channels. Thus stereo parity codTng 
JvZ ITT " 0t °" ly "'"J" 3Udio '=f-rigm svmme 
adl n ate " " Ut 4 noi « 



KAKtrts 



6.4 Generalized Stereo Parity Coding 

There arc various generalizations of the particular 
stereo parity coding case just described. We ou,"ine 

*„: e rc a r ::f y co show ,he app,kabm < y «"« * 

A first obvious generalization is that the same Droee« 
may be app,ied to other audio word lengths S 
16-bu wordlength of CDs-for example the 10-W, 

M- or 24-bit word lengths used in some professional 
aud 10 applications when it is desired to h.de'data The 
aud.o words. For exam ple . in f 3] the authors described 
a proposal to add data a. the 24th bi, in studio opera ons 

fiL ^f..' 0 dCIeCt Whether or not **» "ad been mo d " 
ficd. and the data encoding techniques of this papeTcan 
be used ,„ that application to minimize the audibi ry of 
the modtfication of the signal proposed there. * 
The second generalization is that one can also apply 

meifw> , , e T k bltS " y n m ™ 1 case any 
integer M> 1. In this case, data are coded into the residue 

M and^rr^ ^ ChannelS •*« di ™°° £ 

M, and the stereo parity data channel is coded into the 

Boolean sum of the binary LSB in the two channels of toe 
integer parts of the audio words divided by "fc ^ 
* handled identically to that in Section 63 except Z 
2* » replaced throughout by M, and the phrase last h 
bits is replaced by "residue modulo M " 

A third generalization instead considers a channels 
rather than two. As before, this uses a rhombic qua" 
■n the encoding process for STEP = M LSBs. However 
now one uses the /.-dimensional rhombic quantizer de- 

=o™ E t ( ' 2) - (14) and a v «.or data noise" gnal 
comprising the n W-level data noise signals generated 

n 1 , m0dU, ° * data COnve > ed - ea ° « 'he 
added or subtracted „ STEP, where u is the parity (Boo ! 
ean sum) 0 f the binary LSB in the „ channels of the 
integer parts of the audio words divided by M. Other 

^e STPP? 8 1 0rdi " ary Unif ° rm **P 
s.ze STEP by a rhombic quantizer and using the modified 
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data noise signal, the descriptions given earlier for cod- 
ing data still apply to this case. 

Note that the choice of which channel of the vector 
data noise signal to add or subtract u STEP, and the 
choice of whether to add or to subtract, can be made 
freely, and that this choice can be made adaptively in- 
stant by instant to minimize data noise energy if desired, 
such as by making the choice that minimizes the maxi- 
mum of the data noise signals in the n channels at each 
instant. This choice is a discrete approximation to that 
described in Eq. (16) for uniform PDF vector dither over 
a rhombic quantizer cell. 

6.5 Low-Bit-Rate Case 

If one has n transmitted channels of audio, then the 
parity of their LSBs can be used to transmit a 1-bit 
per /i-channel-sample data channel with remarkable little 
Joss of signal-to-noise ratio, especially in the case where 
full subtractive dithering is used at the LSB level. One 
might expect a loss in signal-to-noise ratio of 6.02/n dB 
because the loss is shared among n channels, but for 
n > 2. one gets a smaller loss, typically between 0.3 
and 0.4 dB better, because of the fact noted in Section 
5, namely, that rhombic vector quantization has a better 
signal-to-noise ratio than independent channel quantiza- 
tion for a given density of quantization points in the 
quantization grid. For n = 6. a I -bit per n-channel- 
sample subtractively dithered buried data channel causes 
a degradation in signal-to-noise ratio of less than 0.6 dB 
compared with a properly dithered case with no buried 
data channel. 

Exactly the same techniques can be used to convey 
data via q successive samples of a monophonic sig- 
nal — for example, by coding into the parity of the LSBs 
of each successive block of q samples, as described in 
Section 2.3. What we have now shown is that by using 
the parity signal as a subtractive dither for any one sam- 
ple with a ^-dimensional rhombic quantizer, plus normal 
triangular additive or subtractive dither, this fractional- 
rate channel can be coded with a very small loss in 
signal-to-noise ratio (for example. 0.6 dB for a block 
length q = 6), and yet with no nonlinear distortion or 
modulation noise in either nonsubtractive or subtrac- 
tive reproduction. 

This kind of efficient low-bit-rate culling of data ca- 
pacity could be used, for example, with successive sam- 
ples within individual subband channels of a subband 
data compression system. Its application is not confined 
to audio. Culling, say, 1-bit per six 10-bit video samples 
in 3 digital video recorder with a video data rate ot 200 
Mbit/s would give a data race typically enough for four 
16-bil audio channels or a consumer-grade additional 
data-reduced video signal while losing only 0.6 dB in 
video signal-to-noise ratio in the original video channel. 

7 CONCLUSIONS 

7.1 Audio Quality Considerations 

Anyone concerned with the Future potential of the 
audio an will have some concern about using i n forma - 
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tion originally allocated to a high-quality audio signal 
to transmit other data instead, as in the proposal in this 
paper. In order to encourage progress in the audio art, 
there is a need for at least one widely available consumer 
medium without built-in serious quality compromises, 
such as CDs (unlike data-reduced digital systems) offer, 
so that the market is there in which recordings with 
improved quality can be made, heard, and sold. Without 
such a medium, we shall find ourselves permanently 
locked into limitations many of which will only become 
apparent as the an of recording, psychoacoustics. and 
studio production develop further. 

Even the best theoretical models of the ears are still 
extremely crude, for example, not describing the effect 
of hearing multiple events with individually tow but 
joindy high detection probabilities, especially for non- 
stationary or transient signals. Many of the musical sub- 
tleties of the best "purisr" recordings probably reside in 
these areas of our technical ignorance. 

We have therefore been concerned with devising bur- 
ied channels that satisfy far more stringent requirements 
than simply crude masking models, which we feel still 
have limited applicability to state-of-the-art recording 
quality. This conservative attitude means that — although 
the option is there with adaptive data rates and noise 
shaping for our proposal to code data if desired to satisfy 
existing masking models — such masking models are in 
no way assumed in the standard. It is a matter of judg- 
ment on a case-by-case basis of individual recordings 
whether such signal manipulations of the error are sub- 
jectively acceptable. 

[n cases where such compromises are not acceptable 
or are considered too risky (especially for material with 
high or serious artistic intent), our proposal allows the 
hidden data channels to produce the most benign kind 
of error— a steady-noise error free of all nonlinear dis- 
tortion or modulation noise, and having any desired 
spectral shape. Unlike previous proposals, this allows 
avoidance of all psychoacoustically disturbing patterns 
in the error signal, whether related to the audio signal 
or to patterns in the transmitted data. 

The beauty of this proposal is that by incorporating 
noise shaping and subtractive dither, it avoids adding 
any more error noise to the audio signal than is strictly 
necessary to handle the desired data rate, typically 
allowing up to 20 dB better perceived signai-to-noise 
ratio than would be achieved simply by replacing the 
relevant audio word bits by data, and typically allowing 
up to 25 dB better perceived signal-to-noise ratio than 
would be achieved were one also to attempt adding dither 
in a simple replacement scheme to avoid nonlinear dis- 
tortion and modulation noise. 

Particularly at low data rates, the audio performance 
of our proposed scheme will typically be comparable to 
or better than some of the better noise-shaped dithering 
systems currently on the market, that is. a CD carrying 
the hidden data channels is likely to sound better than 
current CDs without the data channel . since the encoding 
standard incorporates properly designed dithering (and 
optional properly designed noise shaping). Even at the 
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higher data rates, the use of proper dithering may well 
mean a better sound than is currenrjy the norm. 

All other things being equal, an audiophile listener 
would not choose any degradation of audio quality, even 
if this takes the form of a smooth steady noise free of 
unwanted modulation and nonlinear distortion effects. 
But things are not equal since the data channels can 
be used to convey additional audio channeis in a fully 
compatible way. Provided the coding of these additional 
audio channels is done with sufficient care to avoid audi- 
ble daca-reduction artifacts, we believe that the overall 
improvement obtained by adding at least one extra audio 
channel, either for horizontal B-format ambisonic sur- 
round sound or for three-channel frontal-stage stereo, 
may subjectively more than make up for the relatively 
benign loss in signal-to-noise ratio (compared to the best 
noise-shaped dithered performance of which CDs are 
capable) of the added data channels. 

AJtemative audio uses of the additional data channel 
include compatible frequency-range extension without 
the audible degradations of quality heard in existing 
commercial schemes for this, and the transmission of 
level-alteration information to allow dynamic-range ad- 
justment of the recording for users equipped with data 
decoders. 

7.2 Summary 

In this paper we described a method of forcing the 
least significant information in the audio words to con- 
form to the data values of data channels, while ensuring 
that the effect on the audio is that of adding a noise- 
shaped steady pattern-free random noise at a level no 
greater than would be expected from Shannon informa- 
tion theory from the number of bits "stolen" from the 
audio for an optimally noise-shaped subtractively dith- 
ered system. 

These techniques involve a process for pseudoran- 
domizing the data so that the audio sees it as a random 
noise signal which is optimized for subtractively dith- 
ering the audio, to eliminate both nonlinear distortion 
and modulation noise. Not only is the subtractive dith- 
ering automatically operative in ordinary playback, but 
in addition full noise shaping can be applied to the data 
dither as well. 

This paper has further extended this technique not just 
to the encoding of data in individual audio signals, but to 
a technique— stereo parity coding— that allows efficient 
coding of data jointly into two or more audio channels, 
by using a vector quantization and subtractive vector 
dithering process. The joint coding process not onJv 
ensures symmetry of the way noise is distributed amoni 
the audio channels, but in addition gives a substantial 
improvement in noise performance, especially at low 
data rates in the data channels. The attainable noise 
performance approaches the theoretical Shannon limits 
for the combined Shannon data rate of the audio and 
buried data channels. 

In describing these techniques, a brief account has 
been given of the generalization of the ordinary theorv 
of subtractive dither to the vector quantizer and vector 



dither case. 

Possible uses of the resulting benign hidden data chan- 
nels have been described, including additional audio 
channels for multi loudspeaker stereo or surround sound 
audio bandwidth extension, dynamic range control as 
well as obvious data applications such as graphics text 
or lyrics, copyright, track information, and even data- 
reduced video. 

Unlike previous approaches, no assumptions have 
been made regarding the masking abilities of the ears. 
Rather the design aim has been to ensure that the only 
effect on the existing audio of adding data is to cause a 
rmmmal increase in steady background noise, ensuring 
no compromise with other audio virtues of CDs. If a 
noise performance comparable to good current CDs is 
acceptable, this allows data rates of up to 350 kbit/s to 
be transmitted in the buried data channel, although much 
more stringent noise requirements can be met at the 
expense of a reduced data rale, 
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10 PATENT NOTE 

The authors have applied for patents on various tech- 
niques described in this paper, which are now assigned 
to XtraBits. 

APPENDIX 

CALCULATION OF HOW MANY SITS 
CAN BE BURIED 

Here we discuss whether it is possible to quote a figure 
For how many data can be buried while maintaining 
a signal-to-noise ratio psychoacoustically equivalent to 
"conventional" CDs. We take the conventional CD as 



being dithered with white nonsubtractive TPDF dither 
as advocated by Lipshitz et ai. [21 some years ago. This 
has a signal-to-noise ratio of approximately 93.3 dB. 3 
This is chosen because it can be shown to be [2], [4] 
the least "white" dithering that avoids completely, with 
nonsubtractive reproduction, the effects of low-level 
nonlinear distortion and modulation noise due to quanti- 
zation effects. 

A plausible starting point is to see how much psycho- 
acoustic improvement can be obtained by optimal noise 
shaping to add the 4.77-dB advantage from subtractive 
dither, and to express the result as the maximum number 
of bits per channel that can be "culled" while retaining 
the original performance. 

In a recent paper Stuart and Wilson [29] examine the 
psychoacoustic advantage in some detail, and in Table 
I of that reference we see a threshold advantage of, for 
Mample, 15.3 dB for the "N2,9" noise shaper under 
"MAFM" listening conditions. Stuart and Wilson also 
draw attention to the synergy between noise shaping and 
pre- or deemphasis. Conventional deemphasis is ineffi- 
cient at noise reduction because the greatest reduction 
occurs in spectral regions (at 10 kHz and above) to which 
the ear is relatively insensitive. The provision of addi- 
tional noise shaping allows this advantage to be trans- 
ferred to the critical band around 4 kHz. 

Referring to the same table [29, table I], we see that 
the standard 50/15-fxs deemphasis curve reduces the per- 
ceived level of white noise by 3.4 dB, but it reduces the 
threshold for the "N2.9" noise-shaped noise by 6.4 dB, 
giving a total improvement of 21.7 dB relative to white 
TPDF dither. An even greater advantage is obtainable 
if the noise shaping is reoptimized with the preemphasis 
in mind. In [29], table 3 we see an advantage of 22.4 
dB quoted for the noise shaper "M2449P. " 

It is unfair to quote 22.4 dB as the advantage over 
the conventional CD, since the 3.4-dB preemphasis ad- 
vantage has been available right from the launch of CDs. 
Furthermore, the use of preemphasis entails providing 
extra headroom due to increased high-frequency levels'! 
and this provision may be comparable to or in excess of 
3.4 dB (which is one of th<? reasons why preemphasis 
has not found universal favor so far). We therefore con- 
sider it more correct to quote 22.4 - 3.4 dB = 19 dB 
as the advantage of this scheme over conventionaf tech- 
nology. 

To this we add 4. 18 dB subtractive dither ad van rage— 
not quite the 4.77 dB quoted for rather subtle reasons 
(41 concerning the fact that the -•N2.9 , • and "M2449P" 
curves assume use of "Lipshitz hiuh-pass dither' [30] 
rather than white TPDF dither. Thus the total advantage 
of the new technology could be claimed as 19 - 4 fs 
dB = 23. 18 dB. 

From this it could naively be concluded thai we could 
cu/i 23.18/6.02 - 3.85 bit from each channel and still 
claim performance equivalent ro conventional CDs. If 



- If the point ot reference for conventional CDs were to be 
taken as that using u quantizer with white Gaussian dither as 
was the case :i tew years ayo. rhen the reference would have 
a signal- to- noise ratio of 92. 1 dB. 
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fractional-bit data rates are feasible, (his translates io a 
rate of 3.85 x 44.2 x 2 = 339.57 kbit/s. 

However, there are some points that this discussion 
has overlooked. 

1) As pointed out in [29] and elsewhere, the threshold 
reduction advantage for a noise-shaping carve (as 
quoted) 13 not the same as the reduction in loudness if 
the replay level is such that the hiss is actually audible 
In principle, this will be the case for some CDs plavcd 
at realistic levels [31], 7 

2) If the noise-shaped noise is loud enough to be 
audible its subjective quality is also important, and 
heavy HF boosts involved in some of the noise shapers 
have been questioned in this regard. 

3) There is also a "risk" element. It is possible that 
certain listeners with exceptional HF hearing may find 
the standard noise-shaping curves (optimized for aver- 
age or typical listeners) extremely objectionable. To the 
authors' knowledge, this point has not been investigated 
experimentally. There could also be problems with 
tweeters with a resonant peak within the audible spec- 
trum, or with listeners who use EQ for increased treble 

4) As is well known, although noise shaping reduces 
the audibility of the noise, the total noise power is in- 
creased, typically by around 20 dB for the noise shapers 
discussed. This noise power does become audible, how- 
ever, when data errors on cheap CD players cause noise 
sidebands in the spectra] regions where rhe ear is more 
sensitive. If we apply the same noise shaping to a chan- 
nel where 4 bit have been culled for buried data, the 
total noise power is now about + 44 dB relative to an 
ordinary disk (about - 49 dB relative to peak level), and 
it seems quite plausible that this could cause operational 
problems in domestic playback. 

In order to address these problems, one of us (MAG) 
(unpublished) has devised a family of "moderate" noise- 
shaping curves, which provide a more modest noise - 
shapmg advantage, but also much lower levels of HF 
boost. Furthermore, the subjective quality of the noise 
has also been taken into account. Our preferred option 
is a curve that provides 9 .2 dB of psychoacoustic advan- 
tage in the nonpreemphasized case, with an increase in 
total noise power of only about 6.6 dB . The correspond- 
ing advantage in the preemphasized case is 12.5 dB 

Taking into account the 4.77-dB potential advantage 
from subtractive dither, we have a total advantage of 
13.97 dB in the nonpreemphasized case, or 17.27 dB in 
the preemphasized case. Hence it seems safe to claim a 
culling of 2.5 bit per channel, giving a total buried data 
rate of 220.5 kbit/s. with a stgnal-to-noise ratio 2 dB 
better than the conventional preemphasized CD and no 
significant disadvantages. 

For the chosen ''moderate" psychoacoustic noise 
shaper. the break-even point fordaia giving conventional 
CD subjective performance is thus about 2% bit culled 
from each of the two stereo channels. Clearlv for other 
choices of noise shaper with different tradeoffs between 
perceived noise reduction and data-error-noise risk, rhe 
amount that could be culled may he larger. 

Insight into the various tradeoffs can be gained by 
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perusal of Table 1. In the first row we see displayed the 
conventional 93.3-dB signal-to-noise ratio, the 3.4-dB 
perceived improvement from deemphasis. and the 4.77- 
dB improvement from subtractive dither, implemented 
using autodither [3] or by other means. 

The second row explores the possibility of adding just 
enough noise shaping to restore the final spectrum to 
flat, in the deemphasized case. A simple noise-shaping 
filter with just one pole and one zero will do this, and 
we see a further improvement of 3.6 dB. The reduction 
in noise level relative to the nonpreemphasized case is 
6.7 dB. However, Table 1 does not take account of extra 
headroom requirements of preemphasized signals, so it 
is probably safer to quote 3.6 dB as the practical 
advantage. 

The authors believe chat this possibility of a preempha- 
sized disk, but with a flat noise spectrum on replay, 
deserves a wider consideration. It will have none of the 
operational problems listed earlier (save possibly for 4) 
in a very mild form), and thus one obtains a 3.6-dB 
advantage "for free" on all signals except those with 
exceptionally high peak treble energy content. Such ex- 
ceptional signals have low probability according to the 
data of Fielder [32]. 

The third row of Table I introduces the "moderate" 



psychoacoustic noise shaping discussed earlier. It can 
be implemented on its own or, in the preemphasized 
case, "on top of" the noise shaping that flattens the de- 
emphasized noise spectrum. In either case the (further) 
advantage is 9.2 dB, with still a very low risk of opera- 
tional problems. 

Subsequent rows of Table 1 show the effect of culling 
up to 3 bit of data from each channel in '/i-btt increments 
(using, for example, stereo parity coding, Sections 2.2 
and 6.3). In the autodither decode case the signal-to- 
noise ratio is worsened by 3 dB for each l /i bit culled, 
as might be expected. For the listener with an ordinary 
player, the slope is less steep. Here we see the advantage 
of subtractive buried data — the culled bits behave as 
subtractive dither, and by the time 3 bit have been culled, 
the signal-to-noise ratio is only 0.2 dB worse than for 
the listener with a fully subtractive player. 

Referring to the entry in the second column and penul- 
timate row of Tabic I, we see that a preemphasized 
signal-to-noise ratio of 98.7 with 2.5 bit culled per chan- 
nel gives a signal-to-noise ratio of 98.7 dB for the ordi- 
nary listener, compared to 96.7 dB for the ordinary pre- 
emphasized disk. This is the basis for our claim of a 
signal-to-noise ratio 2 dB better than conventional prac- 
tice with a data rate of 220.5 kbit/s total. 



Table I. Psychoacoustic signal-to-noise ratio (in dB) of 1 6-bit-per-channel stereo carrier with and without buried data * 



Buried Data Culling Rate 


Compatible Listener 


Autodither Listener 


No Emphasis 


With Emphasis 


No Emphasis 


With Emphasis 


0 bit, standard TPDF dither 


93.3 


96.7 


98.1 


101.3 


0 bit. noise shaped to flat 


93.3 


. too.o 


98.1 


104.8 


0 bit. "moderate" psychoacoustic noise shaping 


102.5 


109.2 


107.3 


114.0 


7 : bit - 44. 1 kbit/s 


101.3 


108.0 


104.3 


II 1.0 


1 bit - 88.2 kbit/s 


99.5 


106.2 


101.3 


108.0 


Vh bit = 132.3 kbit/s 


97.3 


104.9 


98.3 


105.0 


2 bit = 176.4 kbit/s 


94.7 


101.4 


95.3 


102.0 


2V: bit = 220.5 kbit/s 


920 


98.7 


92.3 


99.0 


3 bit « 264.6 kbit/s 


89.1 


95.8 


89.3 


96.0 



* The buried-dara case assumes use of moderate psychoacoustic noise shaping. The figure quoted is based on the ratio of the 
power in the psychoacoustically equivalent white noise, compared to the maximum sine- wave power at low frequencies. No 
allowance has been made for loss of headroom in the preemphasized cases. 
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EMBEDDING A FIRST DIGITAL INFORMATION SIGNAL INTO A SECOND DIGITAL INFORMATION SIGNAL 

FOR TRANSMISSION VIA A TRANSMISSION MEDIUM 



The invention relates to a transmitter for transmitting a first and second digital 
information signal via a transmission medium, said first digital information signal comprising 
first frames having at least a first synchronization signal and a data portion stored in them, 
the transmitter comprising: 
5 - input means for receiving the first and second digital information signal; 

- processing means for processing the second digital information signal into subsequent 
second frames, said second frames comprising blocks of information of the second digital 
information signal; 

- signal combination means for inserting a second synchronisation signal and at least the data 
10 portion of a first frame into a second frame of the second digital information signal so as to 

obtain a composite frame; 

- output means for supplying the composite frames to an output terminal so as to obtain a 
composite signal to be transmitted. 

The invention further relates to a receiver for receiving a composite signal 
1 5 from a transmission medium and generating a first and a second digital information signal, to 
a record carrier obtained with the transmitter, when in the form of an apparatus for recording 
information on a record carrier, and to a transmission method. 



20 Transmitters and receivers defined above are commonly known in the form of 

transmitters for transmitting an MPEG encode signal. Transmission systems usually use 
multiple layers. Synchronisation becomes possible only by the use of sync patterns in these 
layers. However, these sync patterns in a system having multiple sync patterns reduces the 
transmission efficiency. For example in DVD-Video sync patterns are used in both the 

25 system stream layers as well as in the elementary stream layers. Only the sync pattern in the 
highest system layer is used for synchronisation on the system stream. The sync patterns in 
the elementry streams are used for synchronization during decoding of said elementry stream. 
Further, DAB uses sync patterns in both the system stream layer as well as in the elementary 
stream layer. However a decoder uses only one of both. 
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The invention aims at providing transmitters and receiver having an more 
efficient method of transmitting and receiving a first and a second digital-information signal 
whereby said first digital information signal comprises first frames having at least a second 
5 synchronization portion. 

The transmitter in accordance with the invention is characterized in that signal 
combination means are adapted to strip the first synchronization signal from said first frames 
prior to inserting at least the data portion of the first frames into the second frames. 

The receiver in accordance with the invention is characterized in that the 
1 0 receiver further comprises; 

- synchronization signal generator means for generating a first synchronization signal; 

- signal combination means for combining the first synchronization signal and the at least the 
data portion of the first digital information signal so as to obtain a first frame of the first 
digital information signal; 

1 5 - second output means for subsequently supplying the first frames of the first digital 

information signal to a first output terminal so as to obtain the first digital information signal. 

The invention is based on the following recognition. In for example a buried 
data channel in a PCM signal any other information signal may be stored. To be able to 
retrieve the information signal from said buried data channel the buried data channel 

20 comprises frames whereby each frame has a synchronization signal. After detecting said 

synchronization signal, a frame from the buried data channel can be retrieved from the PCM 
signal. If the information signal stored in the buried data channel is an encoded signal 
comprising a sequence of frames each having a synchronization signal, for example an 
MPEG encode signal, said synchronization signal has to be retrieved in a receiver to be able 

25 to decode said sequence of frames. However, if each frame in the buried data channel 

comprises only one frame of the encoded signal, said synchronization signal in a frame of the 
encoded signal needs not to be transmitted, there said synchronization signal can be 
generated in the receiver each time a frame in the buried data channel is retrieved. Thus in a 
transmitter, prior to inserting a frame of the encoded signal into the buried data channel the 

30 synchronization signal is striped from said frame. In a receiver the synchronization signal is 
generated and combined with the data retreived from a frame of the buried data channel so as 
to obtain a frame of the encoded signal. By doing this the data capacity needed to transmit an 
additional signal comprising a sequence of frames is reduced. This reduction may be used to 
use less capacity in the PCM signal for the buried data channel, resulting in a higher quality 
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PCM signal. On the other hand, the extra data capacity in the buried data channel, obtained 
by removing the synchronization signal may be used for transmitting a less compressed data 
signal being normally a better representation of the data signal. — 

5 These and other objects of the invention will become apparent from and 

elucidated further with reference to the embodiments described in the following figure 
description in which 

figure 1 shows an embodiment of a transmitter in accordance to the invention, 
figure 2 shows an embodiment of a receiver in accordance to the invention. 
10 figure 3 shows a Buried data frame-structure with header, 

figure 4 shows the de-randomization circuit, 

figure 5: shows The bits in the buried_data_frame need to be inserted into the 
de-randomization circuit in a specific order, 

figure 6 shows a CRC -check diagram, 
1 5 figure 7 shows A frame of 1 1 52 stereo PCM samples corresponds to 1 92 F3 

frames, 

figure 8 shows a Buried data frame structure without header, 
figure 9 shows the distribution of Encrypted MPEG2 audio data the buried 
data channel and the physical channel. 

20 

Figure 1 shows an embodiment of a transmitter in accordance to the invention. 
The transmitter has a first input terminal 4 for receiving a first digital information signal. 
Said first digital information signal comprises first frames. The first frames comprise at least 
a first synchronization signal and a data portion. The first digital information signal could be 

25 an MPEG encoded signal. The transmitter has a second input terminal 2 for receiving a 
second digital information signal. The second digital information signal is for example a 
normal CDDA signal (Compact Disc Digital Audio). The second digital information is 
supplied to a processing unit 6. The processing unit 6 divides the second digital information 
signal into subsequent blocks of information. From the subsequent blocks of information the 

30 processing unit 6 generates subsequent second frames. In a preferred embodiment the second 
digital information signal is a normal CDDA signal having PCM samples. Preferably, a 
second frame comprises 1 1 52 PCM samples. Each frame is consists of 3 PCM sub frames, 
each having 384 PCM samples. It should be noted that the number 9 PCM sub-frames, each 
having 128 PCM samples, is suitable as well. 
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The transmitter further comprises a sync generator unit 8 for generating a 
second synchronization signal. The second synchronization signal is supplied to a signal 
combination unit 10. The combination unit 10 makes preferably. use of Juried date 
techniques to determine a buried data channel in the PCM samples of a second frame. By 
5 using buried data techniques the perceived S/N ratio of the transmitted PCM signal, which 
comprises a buried data channel in the least significant bits of the PCM samples, is 
approximately the same as the S/N ratio of the original PCM signal. The combination unit 1 0 
inserts the second synchronization signal in the buried data channel. Preferably, the 
synchronization signal is inserted in the second frame such that the frame starts with a sync 

10 pattern in the two least significant bits of its first 6 L+R PCM samples. The data to be stored 
in the buried data channel is preferably inserted in the PCM L and R channel on a sample by 
sample interleaving basis. Figure 3 shows an embodiment of a second frame. Each second 
frame starts with header information. The header information of each frame comprises the 
synchronization signal, the bit allocation of the 3 sub-frames defining the PCM bits 

15 belonging to the buried data channel. Furthermore, the buried data frame payload is an 

example of the least significant bits LSB of the L + R PCM samples, which are determined 
by buried data techniques to be use,: to carry data bits of the buried data channel. Figure 5 
shows an example how the bits in the buried data frame could inserted. Firstly, the header is 
alternately stored in the LSB's of the first 4 Left and Right PCM samples of the first sub- 

20 frame. Next, the data bits are alternately inserted in the allocated buried data payload. In 
Figure 5 the 3 LSB's of the PCM samples of the Left channel and the 2 LSB's of the Right 
channel are allocated to store data. The number in the squares indicates the sequence in 
which the bits are stored in the buried data payload. 

The signal combination unit 10 is arranged to insert at least the data of a first 

25 frame into the buried data frame payload. Firstly, the first synchronization signal is stripped 
from the first frame by unit 12. Further, prior to writing the data portion of the first frames in 
the buried data frame payload the data portion of the first frame is randomized. By 
randomizing a burst of errors in the buried data frame payload will not result immediately to 
uncorrectable errors in the data of the buried data channel. At last, the signal combination 

30 unit 10 is arranged to store in the last 16 bits of the buried data frame payload a CRC-16 

word for error detection purposes. Therefore, the data bits inserted in the buried data channel 
are fed through a LFSR (Linear Feedback Shift Register) with for example polynomial 
0x8005. The final state of the LFSR is stored in the buried data CRC-16 word. The thus 
obtained composite frame is supplied to an output terminal. In the event that there is no 
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capacity in the PCM samples for a buried data channel, only the header information is 
inserted in a second frame. 

The functioning of the transmitter is as follows. The PGM frame consists of 3 
subframes each having 384 PCM samples. The 1 152 PCM samples in a PCM frame 
5 represents a time length which matches exactly the MPEG-2 Audio Layer II frame length, in 
IEC-61937 formatting the first 16 bits of an MPEG Audio frame are unique for the CD 
Surround application (OxFFFC, 12 bit sync + ID=mpeg-l + Layer=II + protection=used). As 
the time length of a PCM frame is equal to the time length of an MPEG frame, the first 1 6 
bits of an MPEG frame need not transmitted. In a receiver said 16 bits have to be placed 

1 0 before the extracted and decoded buried data. Furthermore, a preamble consisting of two 

sync words, an identification word and a pay load length word has to placed before the MPEG 
Audio frame and finally the IEC frame has to be padded with zeros. The transmitter receives 
the CDDA PCM samples and generates subsequent frames each having 1 152 PCM samples. 
The available capacity for a buried data channel is determined. Further the transmitter 

1 5 receives the MPEG audio frame and strips the first bits from said frame. The remaining bits 
of said frame are randomized and a CRC word is determined for the remaining bits. To 
obtain the composite signal firstly, the header information is inserted in the LSB of the first 
PCM samples in a frame. Secondly, the randomized bits are inserted in the buried data 
channel payload. Finally the CRC word is inserted in the last 16 bits of the buried data frame 

20 payload. The thus obtained composite signal is transmitted via a transmission medium. 

The buried data channel is preferably used to transmit extra audio content 
within the 16-bit audio PCM data on a normal Audio CD, This extra audio content is 
preferably compressed according to the MPEG Audio standard. As the first 16 bits of a 
MPEG Audio Frame are unique for the CD surround application they are not transmitted. In 

25 a CD surround decoding apparatus comprising an receiver which will be described below, 
these 16 bits are placed in front of the bits extracted from the buried data channel stored in 
the PCM data. 

Figure 2 shows an embodiment of a receiver for receiving a composite signal 
and generating a first and second digital information signal therefrom. The composite signal 
30 comprises composite frames. A composite frame has a second synchronization signal The 
receiver has an input terminal 20 for receiving the composite signal The composite signal is 
supplied to an detection unit 22 and unit 24. The detection unit 22 is arranged for detecting a 
second synchronization signal and generating a detection signal in response to a detected 
second synchronization signal. The detection signal is supplied to a control input of unit 24. 
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Unit 24 is arranged for retrieving a composite frame from the composite signal in response to 
the detection signal. The composite frames are supplied to a first extraction unit 26 and a 
second extraction unit 28. The first extraction unit 26 is arranged for extracting at least a data 
portion of a first frame of the first digital information signal from a composite frame. The 
data portion of a first frame is supplied to signal combination unit 32. The second extracting 
unit 28 is arranged for extracting at least a part of the second digital information signal from 
a composite frame so as to obtain a second frame of the second digital information signal. 
The subsequent second frames, which form the second digital information signal, are 
supplied to output terminal 30. 

The receiver further comprises a synchronization signal generator unit 34. 
The synchronization signal generator unit 34 is arranged for generating a first 
synchronization signal. The first synchronization signal is supplied to the signal combination 
unit 32. The signal combination unit 32 is arranged for combining the first synchronization 
signal and at least the data portion of a first frame so as to obtain the first frame of the first 
digital information signal. The subsequent first frames are supplied to output terminal 36. 
The subsequent first frames form the first digital information signal. 

The receiver described above functions as follows. The composite signal is 
received at the input terminal 20. A transmitter as described above generates the composite 
signal. The composite signal is a CDDA signal having left and right PCM samples. The 
CDDA signal comprises frames as disclosed in figure 3. The CDDA signal comprises a 
buried data channel. To be able to retrieve the buried data from the CDDA signal each frame 
comprises header information. The header information comprises a second synchronization 
signal. The second synchronization signal is in this embodiment in the two least significant 
bits of the first 6 L + R PCM samples of each frame. However, other ways to insert the 
second synchronization signal are possible, for example in the least significant bit of the first 
12 L + R PCM samples. The synchronization signal detection unit 22 detects the second 
synchronization signal and generates a detection signal in response thereto. Unit 24 retrieves 
under control of the detection signal the composite frames from the CDDA signal. An 
embodiment of a frame is disclosed in figure 3. The second extraction unit 28 receives the 
second frames to generate the second digital information signal. As in this embodiment a 
buried data channel is used there is no need to extract the unmodified bits of the original 
signal from the PCM samples of the second frame. In the event the n LSB's of each PCM 
sample are used to carry the first digital information signal, these bits will introduce audible 
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noise. To reduce the audible noise the MSB's of the PCM samples have to be extracted from 
the second frames. 

The synchronization signal generator unit 34 generates a first- synchronization 
signal. In the event the first digital information signal is an MPEG-2 Audio Layer II signal. 
5 the first 16 bits of each frame are unique for the CD Surround application (OxFFFC, 12 bit 
sync + ID=mpeg-l + Layer=II + protection=used). Furtheremore a preamble consisting of 
two sync words, an identification word. The first synchronization signal comprises at least 
this information. The first extraction unit 26 extracts the header information from the second 
frames. The header information comprises next to the sync signal information the bit 

10 allocation of the sub-frames. The bit allocation defines the bits of the PCM samples 

belonging to the burried data channel, thus the buried data frame pay load. Next, unit 26 
extracts the buried data from the second frames. Preferably, the buried data bits are 
randomized written in the buried data channel. Figure 4 shows an embodiment to de- 
randomize the buried data bits. The circuit comprises an array of delays and exclusive OR ? s. 

1 5 The delays perform a one bit delay function. The reference t n represents inputted buried data 
bit n and S n represents outputted de-randomized bit n. The circuit disclosed in figure 4 
performs the following operation: out = z[16] A z[14] A z[3] A z[l] A z[0], where ,,A " is the 
logical exclusive-or operator and z[n] is the bit extracted n bits back. At the start of a new 
frame the state z has to be initialized with all one's. The de-randomized data of a frame is 

20 supplied to combination unit 32. The first extraction unit preferably comprises a CRC 
checking circuit. A diagram of said circuit is disclosed in figure 6. The last 16 bits of the 
buried data contain a CRC- 16 word for error detection purposes. Each de-randomized buried 
data bit. except for the last 16, is fed through a LFSR (Linear Feedback Shift Register) with 
polynomial 0x8005, as disclosed in figure 6. The final state of the LFSR has to be compared 

25 with the buried data CRC- 16 word. If these two words are not the same, a transmission error 
has occurred. 

Combination unit 32 receives the de-randomized data and calculates the 
payload of the de-randomized data in a first from an MPEG-Audio frame of the first digital 
signal. The combination unit 32 combines the first synchronization signal generated by unit 
30 34 and the calculated payload so as to obtain the preamble of an MPEG Audio frame. The 
de-randomized data is place after the pre-amble. In the event the bit length of the preamble 
and the de-randomized data is not in accordance to the length of a MPEG Audio frame, the 
frame has to be padded with zero's, so as to obtain the correct frame length. The thus 
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obtained frames are supplied to output terminal 36 to provide the first digital information 

signal at the output of the receiver. 

As described above, the first six PCM samples . of a . PCM frame .contain the 

first 24 bits of a burried data frame, being a sync pattern. These 24 bits preferably contain the 

code: 0xF87ElF (1111 1000 0111 1110 0001 1111). 

It can be notes that the number of bits in a buried data frame be always a 

multiple of eight, the de-randomization can be done very efficiently per eight bits. Also the 

CRC-16 calculation can make use of this fact. Further, in the described format two bits are 

reserved in the buried header. These bits may be use for a possible future extension with a 

physical channel and/or copy protection mode. 

In accordance with the invention only one sync pattern is transmitted and the 

other sync and unique patterns of the MPEG Audio frames are regenerated in the receiver. 

The extraction of the buried data payload contained in uniquely decode-able 
buried data frames of 1 152 stereo PCM samples performed by receiver will be described now 
in more detail. A buried data frame is subdivided into three buried data subframes of 384 
samples each. Each subframe for each channel has an individual allocation which is denoted 
by alloc[ch][sub_frame]. For the corresponding channel "ch" and subframe "sub_frame", this 
allocation indicates the number of LSB's of the PCM sample that is used to carry the buried 
data frame. The header information is always contained in the LSB of the PCM samples. The 
applied frame structure is depicted in figure 1. In this example the allocation of the buried 
data subframes is as given in table 1 . 



alloc[ch] [subframe] 


ch 


subframe 


0 


1 


0 


0 


2 


1 


1 


2 


2 


2 

i 


~> 



Table 1 : subframe allocation. 



In order to extract the correct number of LSB's that are used to hold the buried 
data payload, the header needs to be read and interpreted first. Dependent on the allocation 
informal ion in the header, the remaining LSB's of the PCM samples that contain the header, 
may hold buried data payload. 
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For perceptual control of the header information and the buried data payload. 
all the LSB's contained in buried_data_frame, except for the syncword, have to pass bit by 
bit through a de-randomization circuit prior to interpretation. The de-randomization .circuit is 
illustrated in figure 4. The following polynomial is applied 

5 

S n = tn © t n -l © t n . 3 © Vl4 © t n -16. 

At the start of every frame all the states Tj are initialized to the binary value 1 . 

Figure 4 shows the de-randomization circuit. The blocks T represent shift 
10 registers. The additions represent "exclusive or gates". At the start of every frame the shift 

registers are initialized to the binary value 1 . For every new inserted input bit t n , a new output 
bit s„ is generated. 

The bits have to be inserted into the de-randomization circuit in a specific 
order which is explained in figure 5. 

1 5 Figure 5 showstThe bits in the buried_data_frame need to be inserted into the 

de-randomization circuit in a specific order. In the figure this is explained by means of a 
simplified header and buried data payload. Assume that the syncword is only two 2 bits and 
the remaining header is 6 bits. As illustrated in the figure, the allocation for the first subframe 
is 3 LSBs for the left channel and 2 bits for the right channel. The synchronization bits 

20 labeled "1" and "2" are read first and do not pass through the randomization circuit. The 
remaining bits are read in the indicated order. This order is "header first" where alternating 
the left and right channel is read. After that, the bits are read MSB first. All the bits labeled 
"3.. ." have to pass through the randomization circuit prior to interpretation. 

The first action performed in a receiver is synchronization of the decoder to 

25 the CD-DA PCM samples. The syncword is contained in the LSB of the PCM samples 

representing the left and the right channels. The distance between two consecutive sync words 
amounts 2* 1 152 mono PCM samples or 1 152 stereo PCM samples. In order to retrieve the 
syncword, a bit-stream is generated by successively concatenating the LSB of the PCM 
sample corresponding to the left channel and the LSB of the PCM sample corresponding to 

30 the right channel. The last 16 bits of this bitstream are continuously compared to the 
syncword. If there is a match for all 16 bits, only then synchronization is achieved. 

In another embodiment of a receiver two CRC checks are performed. The 
error detection methods used are "CRC-4" and "CRC- 16" which generator polynomials are 
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G(X) = X 16 +X 1S + X 2 +1 . (CRC-16) 

The bits included in the CRC_4 check are the bits after sync_word in the 
header information. The bits included in the CRCM6 check are the first bit after sync_word 
in the header information to the position of the crcl6_check. The CRC method is depicted in 
the CRC-check diagram given in figure 6. For CRC-4, the initial state of the shift register is 
$F. For CRC-16, the initial state of the shift register is SFFFF. All bits included in the CRC 
check are input to the circuit shown in the figure 6. After each bit is input, the shift register i 
shifted by one bit. After the last shift operation, the outputs bn. . .bO constitute a word to be 
compared with the CRC-check word in the stream. If the words are not identical, a 
transmission error has occurred in the field on which CRC-4 has been applied. To avoid 
annoying distortions, application of a concealment technique, such as muting of the actual 
frame or repetition of the previous frame is recommended. Figure 6 shows a CRC-check 
diagram. The addition blocks represent "exclusive or" gates. 

The following options of embedding the payload into the CD format are 
available. Firstly, by use of only the buried data channel. No use is made of a physical 
channel. All header information for extracting the buried data payload, such as 
synchronization and allocation information, is merged with the buried data. The payload 
represents an MPEG-2 base and extension frame. 

Secondly, by making use of both a buried data channel and a physical channel. 
The header information is preferably contained in the physical channel. This information is 
merged with the payload in the physical channel. The payload in the physical channel 
represents an MPEG-2 base frame. The buried data payload represents an MPEG-2 extension 
frame. 

Thirdly, by making use of only a physical channel. The control information is 
contained in the physical channel. This information is merged with the payload in the 
physical channel. The payload represents an MPEG-2 base and extension frame. 

In the case a physical limited multi level LML channel is present, it always 
contains the header. Dependent of whether a second channel is used, as signaled by the 
content_descriptor, the LML channel will contain either the MPEG-2 base frame alone or 
additionally she MPEG-2 extension frame. If a buried data channel is used, the start of this 
frame wiU synchronized with the extracted payload from the LML channel. 
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Also in the case where a physical channel is used, either in combination with a 
buried data channel or by itsself, the framing structure of the MPEG-2 payload remains based 
on frames of 1 152 PCM stereo samples. A frame of 1 1 52 PCM audio samples corresponds 
to 192-F3 frames. An F3-frame consists of 24 (user) bytes. During disc formatting, the starts 
5 of the frames of 1 152 stereo PCM samples have been alligned with the F3-frames such thai, 
after incorporation of the decoding delay of the LML data as a result of error correction, the 
data from the two channels is of the same frame. This is illustrated in figure 7. 

Figure 7 shows a frame of 1 152 stereo PCM samples corresponds to 192 F3 
frames. At the moment the synchronization pulse is detected at the "synchronization point". 

10 data at the "frame start point" becomes available from the physical channel. For that specific 
frame. PCM data starts reading at the "synchronization point". 

At any synchronization point, at least 1 1 1 F3 -frames need to be available in 
the buffer in order to have the proper amount of physical data available from that point 
onwards. If this is not the case, decoding can only start at the next synchronization point. 

1 5 The actual extraction of the physical payload is independent of the processing 

related to the buried data channel. For each frame of 1 152 CD-DA PCM samples, A fixed 
amount of 290 kbytes of physical payload becomes available. The physical data becomes 
available byte for byte and is interpreted MSB first. After the header information is read, the 
data representing the MPEG encrypted MPEG-2 base (+extension) frame is read. 

20 In the case the control information is not contained in the buried data channel, 

the extraction of the payload can start at the first PCM sample of the left channel. 
Synchronization and header information is contained in the physical channel. The "alloc" 
information describes the amount of embedded bits per buried data sub frame. An example is 
given in figure 8. Apart from the payload data, room is reserved for a CRC-1 6 that operates 

25 on the full payload contained in the buried data channel. In the case the buried data payload is 
zero, no CRC-1 6 is written. 

The buried data payload and additionally, if present, the physical payload 
within one frame of 1 152 CD-DA PCM samples represent one encrypted MPEG-2 audio 
bitstream that contains 1152 multi-channel audio PCM samples. In the case no physical 

30 channel is used, the buried data payload represents a complete encrypted MPEG2 audio 

stream (base plus extension). In the case a physical channel is used, the buried data payload 
represents an MPEG-2 extension stream and the physical payload represents the encrypted 
MPEG-2 base frame stream. The number of bits contained in an encrypted MPEG2 base 
frame may not exceed the capacity available in the LML channel. The number of bits 
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contained in the encrypted MPEG2 extension frame is variable and is a multiple of 8 bits. 
The division described above is illustrated in figure 9. 

In the case a physical channel is used, the encrypted MPEG-2 base frame bits 
for the corresponding frame are extracted and put in front of buried_data_bits. It should be 
noted that a record carrier with a physical channel is known from USP 5.210.738 and USP 
5,724.327 (PHN 13.992) 

The complete bit-stream (base+extension) is decrypted and subsequently 
MPEG2 decoded, resulting in 11 52 multi-channel PCM audio samples. 

For the decoding of MPEG2 audio data reference is made to ISO/IEC 13818- 



Whilst the invention is described with reference to preferred embodiments 
thereof, it is to be understood that these are not limitative examples. Thus various 
modifications may become apparent to those skilled in the art, without departing from the 
scope of the invention, as defined by the claims. 

The word 'comprising' does not exclude the presence of other elements or 
steps than those listed in a claim. Any reference signs do not limit the scope of the claims. 
The invention can be implemented by means of both hardware and software. Several "means 
may be represented by the same item of hardware. Further the invention lies in each and 
every novel feature or combination of features. 
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CLAIMS: 



1 . Transmitter for transmitting a first and second digital information signal via a 
transmission medium, said first digital information signal comprising first frames having at 
least a first synchronization signal and a data portion stored in them, the transmitter 
comprising: 

- input means for receiving the first and second digital information signal: 

- processing means for processing the second digital information signal into subsequent 
second frames, said second frames comprising blocks of information of the second digital 
information signal; 

- signal combination means for inserting a second synchronisation signal and at least the data 
portion of a first frame into a second frame of the second digital information signal so as to 
obtain a composite frame; 

- output means for supplying the composite frames to an output terminal so as to obtain a 
composite signal to be transmitted; 

characterized in that said signal combination means are adapted to strip the first 
synchronization signal from said first frames prior to inserting at least the data portion of the 
first frames into the second frames. 

2. Transmitter as claimed in claim 1 . characterized in that the signal combination 
means are adapted to insert the data portion of a first frame into a second frame of the second 
digital information signal by using buried data techniques. 

3. Transmitter as claimed in claim 1 or 2. characterized in that a second frame 
represents a portion of the second digital information signal of a predefined duration and a 
first frame represents a portion of a third digital information signal of a substantially the same 
duration. 

4. Transmitter as claimed in claim 3, characterized in that the first digital 
information signal is obtained by datacompression of the third digital information signal. 
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5. Transmitter as claimed in claim 4, characterized in that the first digital 

information signal is in the form of an MPEG encoded signal. 

6 - Transmitter as claimed in claim 4 or 5, charaterized in that the transmitter 
further comprises means for detecting the capacity available in a second frame to insert a firsi 
frame and generating a control signal for controlling the datacompression of the third digital 
information signal, said control signal being indicative for the capacity availabe in said 
second frame. 

7 - Transmitter as claimed in any of the preceding claims, characterized in that the 
second digital information signal comprises at least one PCM signal. 

8 - Transmitter as claimed in anyone of the preceding claims, the transmitter 
being in the form of an apparatus for recording the digital information signal on a record 
carrier. f 

9. Transmitter as claimed in anyone of the preceding claims, characterized in that 

the transmitter further comprises channel-encoding means for channel encoding the 
transmission signal prior to transmission. 

10 Method of transmitting a first and second digital information signal via a 

transmission medium, said first digital information signal comprising first frames having at 
least a first synchronization signal and a data portion stored in them, the method comprising 
the steps: 

- receiving the first and second digital information signal; 

- processing the second digital information signal into subsequent second frames, said second 
frames comprising blocks of information of the second digital information signal: 

- inserting a second synchronization signal and at least the data portion of a first frame into & 
second frame of the second digital information signal so as to obtain a composite frame: 

- supplying the composite frames to an output terminal so as to obtain a composite signal to 
be transmitted; 

characterized in that method further comprises the step stripping the first synchronization 
signal from said first frames prior to inserting at least the data portion of said first frames into 
the second frames. 
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1 1 . Method as claimed in claim 1 0, characterized in that the at least the data 
portion of a first frame is inserted into a second frame of the second digital information signal 
by using buried data techniques. 

' 5 

12. Method as claimed in claim 10 or 11, characterized in that a second frame 
represents a portion of the second digital information signal of a predefined duration and a 
first frame represents a portion of a third digital information signal of a substantially the same 
duration. 

10 

13. Method as claimed in claim 12, characterized in that the first digital 
information signal is obtained by datacompression of the third digital information signal. 

14. Method as claimed in claim 13, characterized in that the first digital 
1 5 information signal is in the form of an MPEG encoded signal. 

15. Transmission medium in the form of a record carrier carrying a composite 
signal comprising portions of a first and a second digital information signal, said composite 
signal being a sequence of composite frames, a composite frame comprises a second 

20 synchronization signal and a data portion of a first frame of the first digital information 
signal, said first frame comprises a first synchronization signal and a data portion, said 
composite frame being obtained by inserting the second synchronization pattern and at least 
the data portion of the first digital information signal into a second frame of the second 
digital information signal, a second frame being obtained by processing the second digital 

25 information signal into subsequent second frames, said second frames comprising blocks of 
information of the second digital information signal, characterized in that prior to inserting at 
least the data portion of a first frame the first synchronization signal is stripped from said first 
frame. 

30 16. Transmission medium as claimed in claim 1 5, characterized in that at least the 

data portion of a first frame is inserted in a second frame by using buried data techniques. 

1 7. Transmission medium as claimed in claim 1 5 or 1 6, characterized in that a 

second frame represents a portion of the second digital information signal of a predined 
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duration and a first frame represents a portion of a third digital information signal of 
substantially the same duration. 



18. Transmission medium as claimed in claim 1 7, characterized in that the first 
digital information signal is obtained by data compression of the third digital information 
signal. 

1 9. Transmission medium as claimed in claim 1 5. 1 6, 1 7 or 1 8, wherein the record 
carrier is of the optical or magnetical recording type. 

20 - Receiver for receiving a composite signal and generating a first and a second 

digital information signal therefrom, the receiver comprising: 

- receiving means for receiving the composite signal; 

- first detection means for detecting a second synchronization signal so a to obtain a detection 
signal; 

- retrieval means for retrieving a composite frame from the composite signal in response to 
said detection signal; 

- first extraction means for extracting at least a data portion of a first frame of the first digital 
information signal from the composite frame; 

- second extraction means for extracting at least a part of the second digital information 
signal from the composite frame so as to obtain a second frame of the second digital 
information signal; 

- first output means for subsequently supplying the second frames to a second output terminal 
so as to obtain the second digital information signal; 

characterized in that the receiver further comprises; 

- synchronization signal generator means for generating a first synchronization signal; 

- signal combination means for combining the first synchronization signal and the at least the 
data portion of the first digital information signal so as to obtain a first frame of the first 
digital information signal; 

- second output means for subsequently supplying the first frames of the first digital 
information signal to a first output terminal so as to obtain the first digital information signal. 



2 1 • Receiver as claimed in claim 20 ? characterized in that a second frame 

represents a portion of the second digital information signal of a predefined duration and a 
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first frame represents a portion of a third digital information signal of a substantially the same 
duration. 

22. Receiver as claimed in claim 20 or 21 . characterized in that the first digital 
5 information signal is a data compressed version of the third digital information signal. 

23. Receiver as claimed in claim 20, 21 or 22. characterized in that the first digital 
information signal is in the form of an MPEG encoded signal. 

10 24. Receiver as claimed in claim 20. 2 L 22 or 23, characterized in that the 

composite signal is a PCM signal and the second digital information signal is substantially 
the composite signal. 

25. Receiver as claimed in any one of the claims 20 to 24, which receiving device 
15 takes the form of a device for reproducing a composite signal recorded on a record carrier. 

26. Receiver as claimed in any one of the claims 20 to 25, characterized in that the 
receiver comprises channel decoding means accommodated immediately after the recieving 
means. 
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